erdogant / pca

pca: A Python Package for Principal Component Analysis.
https://erdogant.github.io/pca
MIT License
284 stars 42 forks source link

Outlier detection plots #5

Closed alexjj closed 3 years ago

alexjj commented 3 years ago

Great library, thanks for sharing! Building all the plots is very convenient :)

This is a feature request to add plots that are useful for outlier detection:

For Hotelling’s this SO post has one way.

erdogant commented 3 years ago

Great suggestion! The outlier detection is now available in the new version and an example can be found in the readme file. The current implementation can rank outliers and therefore I'm not sure what the additional benefit is from using SPE/DmodX scores?

update with: pip install -U pca

alexjj commented 3 years ago

Awesome!

I've only recently started learning about PCA but my understanding was the Hotelling's T2 was for strong outliers - i.e. those that significantly impact the PCs. Whereas DmodX is for moderate outliers, i.e. ones which are more like a temporary excursion from the model.

Both outliers are important but could be treated differently. Strong ones may be removed from the model before fitting and moderate ones are deviations to keep but investigate further.

erdogant commented 3 years ago

I included SPE/dmodX in the library. See the readme file for some examples. Note that some of the input parameters are changed which allows to separately include spe/dmodx and hotelling t2 test outliers in the figures.

pip install -U pca

model.biplot(legend=True, SPE=True, hotellingt2=True)

version should be >= 1.1.2