Open rajivsam opened 4 years ago
Implement a feature to detect attribute drift detection. We have features at a dataset (joint distribution) level, it looks like azure can do this at the attribute level. This is not difficult to do. It requires the following, check the nature of the attribute: (1) If it is continuous (numeric)- the numpy dtype should be float, use the kolmogorov-smirnov 2 sample test to see if the attribute distribution in the training data and the data received in deployment have the same distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html (2) If it is categorical - the numpy dtype is object, use the chi-square test of independence: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html We need a contingency table to do this. We can get this using the group by functionality from pandas: https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas
Note: Check https://towardsdatascience.com/how-to-compare-two-distributions-in-practice-8c676904a285 to see if a completely discrete non-parametric test makes sense.
Implement a feature to detect attribute drift detection. We have features at a dataset (joint distribution) level, it looks like azure can do this at the attribute level. This is not difficult to do. It requires the following, check the nature of the attribute: (1) If it is continuous (numeric)- the numpy dtype should be float, use the kolmogorov-smirnov 2 sample test to see if the attribute distribution in the training data and the data received in deployment have the same distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html (2) If it is categorical - the numpy dtype is object, use the chi-square test of independence: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html We need a contingency table to do this. We can get this using the group by functionality from pandas: https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas
Note: Check https://towardsdatascience.com/how-to-compare-two-distributions-in-practice-8c676904a285 to see if a completely discrete non-parametric test makes sense.