erdogant / pca

pca: A Python Package for Principal Component Analysis.
https://erdogant.github.io/pca
MIT License
286 stars 43 forks source link

Is DmodX estimate consistent with SIMCA (Umetrics)? #9

Closed divakarhebbar closed 3 years ago

divakarhebbar commented 3 years ago

Hi Erdogan,

Thanks for building this package! Much needed for Python ecosystem. I'm wondering if DmodX outputted by pca.spe_dmodx() is consistent with SIMCA? I'm aware that SIMCA's methods are proprietary, but did get a chance to make a comparison of the results?

Thanks!

erdogant commented 3 years ago

Thanks! I'm not familiar with SIMCA software but this is from Sartorius (?) But in this library, the DmodX method is a ellipse with a certain standard deviation(default is 2) that is build using the PCA residuals. Every sample outside the ellipse is called outlier.

On the simca website, I found similar description: The PCA also gives residuals, deviations between the data and the PC model, named DModX. When these residuals are large, this indicates an abnormal behavior in the process. To see this, we make a plot of the residual standard deviation, DModX (residual distance, root mean square). Observations with a DModX larger than the DCrit are outliers. When DModX is twice DCrit they are strong outliers. This indicates that these observations are different from the normal observations with respect to the correlation structure of the variables

I did not compare the results but the description seems to follow similar approach.