Closed vnmabus closed 1 year ago
There is an issue regarding the design of the std function that should be specified, which is the normalization coefficient to apply and whether it should be up to the user.
The definition of std
provided in Kokoszka and Reimherr (2017) is:
$$(stdX(t) ) ^2=\frac{1}{N} \sum{n=1}^{N} (X_n(t) - \overline{X}(t))^2.$$
This normalization by $N$ is the default used in numpy.var
and numpy.std
, the latter being the most natural function to use in the implementation of FDataGrid.std
:
def std(X: FDataGrid) -> FDataGrid:
return X.copy(
data_matrix=np.array([np.std(X.data_matrix, axis=0)]),
sample_names=("standard deviation",),
)
However, the easiest implementation of FDataBasis.std
uses the FDataBasis.cov
method. FDataBasis.cov
calculates the covariance using the formula:
$$(KX(t, s) ) ^2=\frac{1}{N-1} \sum{n=1}^{N} (X_n(t) - \bar{X}(t))(X_n(s) - \bar{X}(s)),$$
because $(N-1)$ is the default normalization used by numpy.cov
.
A natural solution to this issue would be to make the signature of std
similar to that of numpy.std
, where there is a parameter:
ddof
: int, optional Means Delta Degrees of Freedom. The divisor used in calculations isN - ddof
, whereN
represents the number of elements. By defaultddof
is zero.
But including this ddof
parameter in std
would require adding a similar one to the cov
function.
I closed this issue by accident. I'm sorry.
Add a method for computing standard deviation of functional data both in discretized and basis expansions.