Open rygeorge3 opened 3 months ago
When attempting to implement a custom StatTest in python, the run function is failing with the following error:
See below for the code to reproduce the error.
import pandas as pd import numpy as np
from scipy.stats import mannwhitneyu from sklearn import datasets
from evidently.calculations.stattests import StatTest from evidently.test_suite import TestSuite from evidently.tests import *
adult_data = datasets.fetch_openml(name='adult', version=2, as_frame='auto') adult = adult_data.frame
adult_ref = adult[~adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])] adult_cur = adult[adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])]
adult_cur.iloc[:2000, 3:5] = np.nan
def _mann_whitney_u(reference_data: pd.Series, current_data: pd.Series, _feature_type: str, threshold: float): p_value = mannwhitneyu(np.array(reference_data), np.array(current_data))[1] return p_value, p_value < threshold
mann_whitney_stat_test = StatTest( name="mann-whitney-u", display_name="mann-whitney-u test", func=_mann_whitney_u, allowed_feature_types=["num"] )
data_drift_dataset_tests = TestSuite(tests=[ TestShareOfDriftedColumns(num_stattest=mann_whitney_stat_test), ])
data_drift_dataset_tests.run(reference_data=adult_ref, current_data=adult_cur) data_drift_dataset_tests
I tried with 0.4.2 and it worked. On 0.4.7 there were some other issues, and 0.4.0 was missing some tests. They definitely stuffed something up with Python engine , and since they made func a property
When attempting to implement a custom StatTest in python, the run function is failing with the following error:
See below for the code to reproduce the error.
import pandas as pd import numpy as np
from scipy.stats import mannwhitneyu from sklearn import datasets
from evidently.calculations.stattests import StatTest from evidently.test_suite import TestSuite from evidently.tests import *
Dataset for Data Quality and Integrity
adult_data = datasets.fetch_openml(name='adult', version=2, as_frame='auto') adult = adult_data.frame
adult_ref = adult[~adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])] adult_cur = adult[adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])]
adult_cur.iloc[:2000, 3:5] = np.nan
def _mann_whitney_u(reference_data: pd.Series, current_data: pd.Series, _feature_type: str, threshold: float): p_value = mannwhitneyu(np.array(reference_data), np.array(current_data))[1] return p_value, p_value < threshold
mann_whitney_stat_test = StatTest( name="mann-whitney-u", display_name="mann-whitney-u test", func=_mann_whitney_u, allowed_feature_types=["num"] )
data_drift_dataset_tests = TestSuite(tests=[ TestShareOfDriftedColumns(num_stattest=mann_whitney_stat_test), ])
data_drift_dataset_tests.run(reference_data=adult_ref, current_data=adult_cur) data_drift_dataset_tests