GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
301 stars 54 forks source link

Covariance function as tensor product #505

Closed m5signorini closed 1 year ago

m5signorini commented 1 year ago

Replacing the cov method of FDataBasis to use tensor basis instead of a numerical approximation. This method should calculate the unbiased covariance function, which, given a list of univariate data $\{xn\}{n=1}^N$, it is defined as

$$K(t,s) =\frac{1}{N-1} \sum_{n=1}^N (x_n(t) - \bar{x}(t))(x_n(s) - \bar{x}(s)).$$

If the functions $xn$ are expressed in a basis $\{\phi\}^J{j=1}$ such that $x_n(t) = \sum \alpha_j^{(n)}\phi_j(t)$ then

$$ K (t,s) = \frac{1}{N-1} \sum{n=1}^N \left( \sum{j=1}^J(\alpha_j^{(n)} - \bar{\alpha}_j)\phij(t)\right) \left( \sum{j=1}^J(\alpha_j^{(n)} - \bar{\alpha}_j)\phi_j(s)\right), $$

which it translates into calculating the covariance matrix of the coefficients $\alpha$ centered. Thus,

$$ K(t,s) = \Phi(t)^T C \Phi(s), $$

where $\Phi$ is the basis functions as a column vector and $C$ is the unbiased sample covariance matrix of the coefficients. Thus $K$ has the matrix $C$ as coefficients in the tensor basis $\{\phi_1(t)\phi_1(s),\quad \phi_1(t)\phi_2(s), \quad \cdots, \quad \phi_J(t)\phi_J(s)\}$

m5signorini commented 1 year ago

Modifying the cov method affects tests that were assuming an FDataGrid as the result from calling it. I believe this can be solved converting the result from cov using to_grid in such cases.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 91.12% and project coverage change: +0.05 :tada:

Comparison is base (fbe63ce) 85.77% compared to head (4c411e4) 85.83%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #505 +/- ## =========================================== + Coverage 85.77% 85.83% +0.05% =========================================== Files 145 146 +1 Lines 11500 11596 +96 =========================================== + Hits 9864 9953 +89 - Misses 1636 1643 +7 ``` | [Impacted Files](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM) | Coverage Δ | | |---|---|---| | [skfda/exploratory/stats/\_stats.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvZXhwbG9yYXRvcnkvc3RhdHMvX3N0YXRzLnB5) | `86.36% <66.66%> (ø)` | | | [skfda/representation/\_functional\_data.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvcmVwcmVzZW50YXRpb24vX2Z1bmN0aW9uYWxfZGF0YS5weQ==) | `83.15% <66.66%> (-0.42%)` | :arrow_down: | | [skfda/exploratory/stats/covariance/\_empirical.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvZXhwbG9yYXRvcnkvc3RhdHMvY292YXJpYW5jZS9fZW1waXJpY2FsLnB5) | `75.00% <75.00%> (+5.00%)` | :arrow_up: | | [skfda/representation/basis/\_fdatabasis.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvcmVwcmVzZW50YXRpb24vYmFzaXMvX2ZkYXRhYmFzaXMucHk=) | `81.02% <83.33%> (-0.05%)` | :arrow_down: | | [skfda/representation/grid.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvcmVwcmVzZW50YXRpb24vZ3JpZC5weQ==) | `86.25% <83.33%> (-0.02%)` | :arrow_down: | | [skfda/inference/anova/\_anova\_oneway.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvaW5mZXJlbmNlL2Fub3ZhL19hbm92YV9vbmV3YXkucHk=) | `89.15% <85.71%> (ø)` | | | [skfda/misc/covariances.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvbWlzYy9jb3ZhcmlhbmNlcy5weQ==) | `84.03% <95.83%> (+1.49%)` | :arrow_up: | | [skfda/exploratory/stats/covariance/\_base.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvZXhwbG9yYXRvcnkvc3RhdHMvY292YXJpYW5jZS9fYmFzZS5weQ==) | `81.81% <100.00%> (+0.86%)` | :arrow_up: | | [...ploratory/stats/covariance/\_parametric\_gaussian.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvZXhwbG9yYXRvcnkvc3RhdHMvY292YXJpYW5jZS9fcGFyYW1ldHJpY19nYXVzc2lhbi5weQ==) | `100.00% <100.00%> (ø)` | | | [skfda/inference/hotelling/\_hotelling.py](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvaW5mZXJlbmNlL2hvdGVsbGluZy9faG90ZWxsaW5nLnB5) | `91.78% <100.00%> (+0.11%)` | :arrow_up: | | ... and [2 more](https://app.codecov.io/gh/GAA-UAM/scikit-fda/pull/505?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

m5signorini commented 1 year ago

Latest commits were done in order to overload the cov method, adding the possibility to pass points of evaluation. I believe that due to the nature of @overload, only the last method is used as implementation and taking into account the overloaded methods in the parent class is tricky, thus I needed to include the same signatures in both FDataBasis and FDataGrid. I don't know if there is a better way to overload the method other than expecting optionally None values and checking at runtime.

m5signorini commented 1 year ago

Latest commits aim to eliminate the use of cov() and use instead cov(s,t) for evaluating the covariance function. This changes are applied only where the data matrix is needed. The only other instances of calls cov() appear in CovarianceEstimator and for now it is expected to use a FData object.

vnmabus commented 1 year ago

@pcuestas You needed this PR, so you can now start.