evidentlyai / evidently

Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b
Apache License 2.0
4.86k stars 545 forks source link

Predefined tests expect the target variable to be named 'target' regardless of the predefined ColumnMapping #1097

Open Borka95 opened 2 months ago

Borka95 commented 2 months ago

I encountered two issues while using the EvidentlyAI package.

1) Predefined tests expect the target variable to be named 'target': Some of the predefined tests in the package expect the target variable in the dataset to be named 'target'. Although it is possible to specify a custom column mapping to define the target variable, some tests in the package only work if the target variable is actually named 'target'. This requires renaming the column before analysis to successfully execute the test. Additionally, the PredictionColumns must be named exactly 'prediction'. In my case, the columns were named 'prediction' and 'Regression', and the error message was: 'target' and 'prediction' columns must be defined in the dataset, despite a functioning column mapping and successful reports.

2) Insufficient explanation when an exception occurs: Occasionally, an exception occurs without sufficient explanation. Instead of an informative error message, only a portion of the internal code and an exception are provided, which does not clearly explain the reason for the error. This complicates troubleshooting and understanding of the problem significantly.

Steps to reproduce:

  1. Use the EvidentlyAI package with a dataset where the target variable is not named 'target'.
  2. Execute the predefined tests in the package.
  3. The code does not work because the target variable is not named 'target'.

Expected behavior: The predefined tests in the EvidentlyAI package should be able to run successfully regardless of the naming of the target variable. An elaborate and understandable error message should be provided in case an exception occurs to facilitate troubleshooting.

Additional information: Package version used: [latest] Operating system and version: [Windows 10 and VS Code] Programming language and version used: [Python 3.9]

elenasamuylova commented 2 months ago

Hi @Borka95,

Could you share:

Borka95 commented 2 months ago

Hi @elenasamuylova and thank you for your quick answer :)

I have generated a synthetic dataset. To avoid typos, I am using an enum. I have revised the code in the meantime. Unfortunately, I no longer have the state of the code as it was when I wrote the issue. Therefore, I have compiled all the tests I have performed, as well as the reports created. It was necessary to rename the "Regression" column, and "Target" with a capital letter was also not accepted. The reports worked with this ColumnMapping, but not all tests did, and I don't know which ones.

class COLUMNS(Enum): Timestamp = 'Timestamp' Temperature = 'Temperature' Humidity = 'Humidity' Pressure = 'Pressure' Speed = 'Speed' Classification = 'Classification' Regression = 'Regression' Prediction = 'prediction' Target = 'target'

column_mapping = ColumnMapping() column_mapping.target = COLUMNS.Regression.value column_mapping.prediction = COLUMNS.Prediction.value column_mapping.numerical_features = features column_mapping.categorical_features = []

Report(metrics=[ClassificationPreset()]) Report(metrics=[RegressionPreset()]) Report(metrics = [DataDriftPreset(stattest_threshold=threshold)]) Report(metrics=[ RegressionErrorBiasTable(columns=[COLUMNS.Temperature.value, COLUMNS.Humidity.value]), RegressionDummyMetric() ])

tests = TestSuite(tests=[ TestAccuracyScore(), TestPrecisionScore(), TestRecallScore(), TestF1Score(), TestColumnDrift(column_name=COLUMNS.Temperature.value, stattest_threshold=threshold), TestNumberOfRows(), TestPrecisionByClass(label=1), TestRecallByClass(label=1), TestF1ByClass(label=1), TestColumnDrift(column_name=COLUMNS.Humidity.value, stattest_threshold=threshold), TestConflictTarget(), TestConflictPrediction(), TestTargetPredictionCorrelation(), TestHighlyCorrelatedColumns(), TestTargetFeaturesCorrelations(), TestPredictionFeaturesCorrelations(), TestCorrelationChanges(), TestNumberOfDriftedColumns(stattest_threshold=threshold), TestShareOfDriftedColumns(stattest_threshold=threshold), TestColumnDrift(column_name=COLUMNS.Temperature.value, stattest='psi', stattest_threshold=threshold), ])