evidentlyai / evidently

Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
Apache License 2.0
4.88k stars 547 forks source link

Custom StatTest Failing in Python Due to missing Python Engine #1064

Open rygeorge3 opened 3 months ago

rygeorge3 commented 3 months ago

When attempting to implement a custom StatTest in python, the run function is failing with the following error:

See below for the code to reproduce the error.

import pandas as pd import numpy as np

from scipy.stats import mannwhitneyu from sklearn import datasets

from evidently.calculations.stattests import StatTest from evidently.test_suite import TestSuite from evidently.tests import *

Dataset for Data Quality and Integrity

adult_data = datasets.fetch_openml(name='adult', version=2, as_frame='auto') adult = adult_data.frame

adult_ref = adult[~adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])] adult_cur = adult[adult.education.isin(['Some-college', 'HS-grad', 'Bachelors'])]

adult_cur.iloc[:2000, 3:5] = np.nan

def _mann_whitney_u(reference_data: pd.Series, current_data: pd.Series, _feature_type: str, threshold: float): p_value = mannwhitneyu(np.array(reference_data), np.array(current_data))[1] return p_value, p_value < threshold

mann_whitney_stat_test = StatTest( name="mann-whitney-u", display_name="mann-whitney-u test", func=_mann_whitney_u, allowed_feature_types=["num"] )

data_drift_dataset_tests = TestSuite(tests=[ TestShareOfDriftedColumns(num_stattest=mann_whitney_stat_test), ])

data_drift_dataset_tests.run(reference_data=adult_ref, current_data=adult_cur) data_drift_dataset_tests

Nakulbajaj101 commented 2 months ago

I tried with 0.4.2 and it worked. On 0.4.7 there were some other issues, and 0.4.0 was missing some tests. They definitely stuffed something up with Python engine , and since they made func a property