GAA-UAM / scikit-fda

Functional Data Analysis Python package
https://fda.readthedocs.io
BSD 3-Clause "New" or "Revised" License
300 stars 54 forks source link

Example fpca.inverse_transform to detect outliers #389

Closed Clej closed 2 years ago

Clej commented 2 years ago

I propose an example of using inverse_transform in FPCA to detect outliers from the reconstruction error (RE) between the input and the recovered input from the eigenspace. In other words, given a fitted fpca, a threshold and a test sample x the decision rule is:

x_hat  = fpca.inverse_transform(fpca.transform(x))
RE = lp_distance(x, x_hat, p=2) / lp_norm(x, p=2)
if RE>= threshold:
   print('x is an outlier')

In the example, we use synthetic data generated from centered gaussian processes. The nonoutliers are generated with a Gaussian kernel and the outliers are generated with an exponential kernel. There are only nontouliers in the training dataset (used to fit the FPCA and set the threshold). There are both outliers and nonoutliers in the test samples.

I tried to make my explanations as concise and intuitive as possible.

The script generates two figures: (i) the dataset and (ii) the distribution of the REs. (i) dataset_fpca_outl

(ii) RE_density_fpca_outl

Finally, we output the truely detected outliers and nonoutliers.

codecov[bot] commented 2 years ago

Codecov Report

Merging #389 (546ea84) into develop (a4e7b04) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #389   +/-   ##
========================================
  Coverage    80.00%   80.00%           
========================================
  Files           86       86           
  Lines         6928     6928           
========================================
  Hits          5543     5543           
  Misses        1385     1385           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a4e7b04...546ea84. Read the comment docs.