GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
287 stars 51 forks source link

FPCA on FDataIrregular #613

Open ooodragon94 opened 2 months ago

ooodragon94 commented 2 months ago

Motivation

I'm trying to apply FPCA on functional data, where the function's input output dimension is R^3 -> R.

Basically following this: https://fda.readthedocs.io/en/stable/auto_examples/plot_fpca_inverse_transform_outl_detection.html#sphx-glr-auto-examples-plot-fpca-inverse-transform-outl-detection-py

I have a custom dataset made with FDataIrregular, however, FPCA does not seem to work on FDataIrregular.

Desired functionality

FPCA on FDataIrregular for outlier detection on irregular functions.

Alternatives

No response

Additional context

No response

eliegoudout commented 2 months ago

I believe that at the moment, even for FDataGrid, the package doesn't provide out of the box FPCA for $\mathbb{R}^n\rightarrow\mathbb{R}$ when $n\geqslant 2$. For further information and a potential workaround, I'd point towards #512.

ooodragon94 commented 2 months ago

@eliegoudout
thank you for your reply. That's a bad news.... :( Could any other libraries work? (ex. https://fdasrsf-python.readthedocs.io/en/latest/fPCA.html) At least if it works for regular grid, I think I can process the data with interpolation

eliegoudout commented 2 months ago

I am personally not familiar enough wit this pacage to answer your question, sorry. Someone else might!

ooodragon94 commented 2 months ago

As eliegoudout referenced, I'm trying out a method explained in https://github.com/GAA-UAM/scikit-fda/discussions/512 where I can do fPCA with single dimension function (R -> R) and then add up all errors on other dimensions.

However, I'm facing an error "numpy.linalg.LinAlgError: Matrix is not positive definite" when doing q = 2 fpca_clean = FPCA(n_components=q) fpca_clean.fit(fd)

it seems like an error while doing inverse transform for PCA. When does it work and when does it not work?

ooodragon94 commented 2 months ago

Looking that the source code, it seems like _weights variable seems to play important role in cholesky decomposition (error described above). When not given, fPCA will initialize it to list of zeros with length of a function. I simply put ones instead of zeros when initializing FPCA. Seems like it is working as expected, since increasing q (=components to keep in PCA) gives smaller error.

q = 5
function_len = train_data[0].data_matrix.squeeze().shape
fpca_clean = FPCA(n_components=q, _weights=np.ones(function_len))
fpca_clean.fit(train_data)