GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
287 stars 51 forks source link

Feature/fpca_regression #466

Closed Ddelval closed 1 year ago

Ddelval commented 1 year ago

Add the FPCA regression estimator

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 99.01% and project coverage change: +0.11 :tada:

Comparison is base (48d1fae) 85.51% compared to head (5ab79ca) 85.63%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #466 +/- ## =========================================== + Coverage 85.51% 85.63% +0.11% =========================================== Files 141 143 +2 Lines 11284 11384 +100 =========================================== + Hits 9650 9749 +99 - Misses 1634 1635 +1 ``` | [Impacted Files](https://codecov.io/gh/GAA-UAM/scikit-fda/pull/466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM) | Coverage Δ | | |---|---|---| | [skfda/tests/test\_fpca\_regression.py](https://codecov.io/gh/GAA-UAM/scikit-fda/pull/466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvdGVzdHMvdGVzdF9mcGNhX3JlZ3Jlc3Npb24ucHk=) | `98.14% <98.14%> (ø)` | | | [...da/misc/operators/\_linear\_differential\_operator.py](https://codecov.io/gh/GAA-UAM/scikit-fda/pull/466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvbWlzYy9vcGVyYXRvcnMvX2xpbmVhcl9kaWZmZXJlbnRpYWxfb3BlcmF0b3IucHk=) | `95.91% <100.00%> (+0.17%)` | :arrow_up: | | [skfda/ml/regression/\_fpca\_regression.py](https://codecov.io/gh/GAA-UAM/scikit-fda/pull/466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvbWwvcmVncmVzc2lvbi9fZnBjYV9yZWdyZXNzaW9uLnB5) | `100.00% <100.00%> (ø)` | | | [skfda/representation/basis/\_custom\_basis.py](https://codecov.io/gh/GAA-UAM/scikit-fda/pull/466?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM#diff-c2tmZGEvcmVwcmVzZW50YXRpb24vYmFzaXMvX2N1c3RvbV9iYXNpcy5weQ==) | `86.25% <100.00%> (+0.53%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GAA-UAM)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Ddelval commented 1 year ago

I think this functionality is implemented in the package fda.usc. We should have a test verifying that we obtain the same results.

I have (finally) added this test. However, while going back and forth between python and R, I stumbled upon a python packet that provides a simple way to execute R code from within python. I am sceptical regarding whether or not this could be useful for the tests. On the one hand, it would ensure that the tests are testing against the correct values while forcing us to include the R source used to obtain the test reference data. On the other hand, it would make the tests considerably slower and require modifications to the GitHub actions.

I think the timing tradeoff is too significant (I measured a slowdown of around 6x). Nevertheless, I thought I might just mention it just in case it comes in handy at some point. The following snippet generates the testing data for the test I just added.

from rpy2 import robjects

Rresult = robjects.r('''
library("fda.usc")
data(tecator)
# Fit the regression model with the first 129 observations
x=tecator$absorp.fdata[1:129,]
y=tecator$y$Fat[1:129]
res2=fregre.pc(x,y,l=1:10)

# Predict the response for the remaining observations
n = length(tecator$y$Fat)
xnew=tecator$absorp.fdata[130:n,]
result = predict(res2, xnew)
names(result) = NULL

# Output the predicted values
paste(
    round(result,8),
    collapse = ", "
)
''')
r_predictions = np.array(Rresult[0].split(", "), dtype=np.float64)