GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
309 stars 58 forks source link

Add standard deviation #541

Closed vnmabus closed 1 year ago

vnmabus commented 1 year ago

Add a method for computing standard deviation of functional data both in discretized and basis expansions.

pcuestas commented 1 year ago

There is an issue regarding the design of the std function that should be specified, which is the normalization coefficient to apply and whether it should be up to the user.

The definition of std provided in Kokoszka and Reimherr (2017) is:

$$(stdX(t) ) ^2=\frac{1}{N} \sum{n=1}^{N} (X_n(t) - \overline{X}(t))^2.$$

This normalization by $N$ is the default used in numpy.var and numpy.std, the latter being the most natural function to use in the implementation of FDataGrid.std:

def std(X: FDataGrid) -> FDataGrid:
    return X.copy(
        data_matrix=np.array([np.std(X.data_matrix, axis=0)]),
        sample_names=("standard deviation",),
    )

However, the easiest implementation of FDataBasis.std uses the FDataBasis.cov method. FDataBasis.cov calculates the covariance using the formula:

$$(KX(t, s) ) ^2=\frac{1}{N-1} \sum{n=1}^{N} (X_n(t) - \bar{X}(t))(X_n(s) - \bar{X}(s)),$$

because $(N-1)$ is the default normalization used by numpy.cov.

A natural solution to this issue would be to make the signature of std similar to that of numpy.std, where there is a parameter:

ddof: int, optional Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

But including this ddof parameter in std would require adding a similar one to the cov function.

pcuestas commented 1 year ago

I closed this issue by accident. I'm sorry.