evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.36k stars 594 forks source link

TestAccuracyScore calculating #488

Open samuelamico opened 1 year ago

samuelamico commented 1 year ago

Hello, I'm trying to run the Test: TestAccuracyScore, manually, however I'm getting an Error using the earlier version 0.2 In the previous version the code was running fine.

My code:

from sklearn import datasets, ensemble, model_selection
#Dataset for Multiclass Classifcation (labels)

iris_data = datasets.load_iris(as_frame='auto')
iris = iris_data.frame

iris_ref = iris.sample(n=75, replace=False)
iris_cur = iris.sample(n=75, replace=False)

model = ensemble.RandomForestClassifier(random_state=1, n_estimators=3)
model.fit(iris_ref[iris_data.feature_names], iris_ref.target)

iris_ref['prediction'] = model.predict(iris_ref[iris_data.feature_names])
iris_cur['prediction'] = model.predict(iris_cur[iris_data.feature_names])

schema = ColumnMapping(
    target='target',
    prediction='prediction',
    datetime=None,
    id= None,
    numerical_features = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
       'petal width (cm)'],
    categorical_features = None,
    datetime_features = None,
    task = 'classification'
)

columns = [
    ColumnDefinition(column_name='sepal length (cm)',column_type="num"),
    ColumnDefinition(column_name='sepal width (cm)',column_type="num"),
    ColumnDefinition(column_name='petal length (cm)',column_type="num"),
    ColumnDefinition(column_name='petal width (cm)',column_type="num")
]

ires_data_definition = DataDefinition(
        columns = columns,
        target = ColumnDefinition(column_name='target',column_type="num"),
        prediction_columns = ColumnDefinition(column_name='prediction',column_type="num"),
        id_column = None,
        datetime_column = None,
        task = 'classification',
        classification_labels = None
)

## Running
test_classification = TestAccuracyScore()
input_data = InputData(
    reference_data = iris_ref,
    current_data = iris_cur,
    column_mapping = schema,
    data_definition = ires_data_definition
)

test_classification.metric.calculate(data = input_data)

ERROR:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [39], in <cell line: 11>()
      3 test_classification = TestAccuracyScore()
      4 input_data = InputData(
      5     reference_data = iris_ref,
      6     current_data = iris_cur,
      7     column_mapping = schema,
      8     data_definition = ires_data_definition
      9 )
---> 11 test_classification.metric.calculate(data = input_data)

File ~/Desktop/Codes/CoachMe/API-Dev/carrot_latest/lib/python3.9/site-packages/evidently/metrics/classification_performance/classification_quality_metric.py:48, in ClassificationQualityMetric.calculate(self, data)
     44     raise ValueError("The columns 'target' and 'prediction' columns should be present")
     45 target, prediction = self.get_target_prediction_data(data.current_data, data.column_mapping)
     46 current = calculate_metrics(
     47     data.column_mapping,
---> 48     self.confusion_matrix_metric.get_result().current_matrix,
     49     target,
     50     prediction,
     51 )
     53 reference = None
     54 if data.reference_data is not None:

File ~/Desktop/Codes/CoachMe/API-Dev/carrot_latest/lib/python3.9/site-packages/evidently/metrics/base_metric.py:50, in Metric.get_result(self)
     48 def get_result(self) -> TResult:
     49     if self.context is None:
---> 50         raise ValueError("No context is set")
     51     result = self.context.metric_results.get(self, None)
     52     if isinstance(result, ErrorResult):

ValueError: No context is set
Liraim commented 1 year ago

Hi, @samuelamico,

In new version we changed internal structure because we have some dependencies between tests/metrics and other metrics. So it is incorrect to call Test / Metric directly.

Can you describe why you want use Test / Metric directly without Reports / TestSuites?

samuelamico commented 1 year ago

Hi @Liraim thanks for the heads up. My goal is to use the change the default value using a reference dataset. In the documentation:

With reference: if the Accuracy is over 20% higher or lower, the test fails.

I want to change the 0.20 value in the approx to other value.

Liraim commented 1 year ago

I see, First, we will add parameter to change this value in tests in future versions.

Second, for now you can use "hack" to alter behavior of test, for example using this code:


from evidently.tests.base_test import TestValueCondition
from evidently.tests.utils import approx

def fixed_condition(obj):
    if obj.condition.has_condition():
        return obj.condition

    result = obj.metric.get_result()
    ref_metrics = result.reference

    if ref_metrics is not None:
        return TestValueCondition(eq=approx(obj.get_value(ref_metrics), relative=0.1)) # here the fix

    dummy_result = obj.dummy_metric.get_result().dummy

    if obj.get_value(dummy_result) is None:
        raise ValueError("Neither required test parameters nor reference data has been provided.")

    return TestValueCondition(gt=obj.get_value(dummy_result))

TestAccuracyScore.get_condition = fixed_condition

This should change behavior on TestAccuracyScore globally.

Third, I'm still curious why you want to use Test without TestSuite/