evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.46k stars 604 forks source link

Using ColumnDriftMetric or TestColumnDrift with virtual text descriptor columns raises AttributeError from v0.4.34 and onwards #1256

Closed blee-gl closed 3 months ago

blee-gl commented 3 months ago

Hello!

I'm currently facing an issue that occurs from v0.4.34 onwards where virtual text descriptors are returning an AttributeError; e.g., using the WordCount metric gives me AttributeError: 'WordCount' object has no attribute 'feature_type'. This happens upon saving or showing a report or a test suite result. Doesn't seem to happen with versions 0.4.33 or lower.

Here's a MCVE using the example data from https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/chat_df.csv. Here I'm using "evidently==0.4.34".

import pandas as pd
from evidently import ColumnMapping
from evidently.descriptors import WordCount
from evidently.metrics import ColumnDriftMetric
from evidently.report import Report

# Load the dataset
chat_df = pd.read_csv("data/chat_df.csv", parse_dates=["start_time", "end_time"])
curr_df = chat_df[:20]
ref_df = chat_df[20:40]

# Create a column mapping for the dataset
column_mapping = ColumnMapping(
    datetime='start_time',
    datetime_features=['start_time', 'end_time'],
    text_features=['question', 'response'],
    categorical_features=['organization', 'model_ID', 'region', 'environment', 'feedback'],
)

# Create report with a single ColumnDriftMetric for the WordCount descriptor
report = Report(
    metrics=[
        ColumnDriftMetric(WordCount().for_column("question")),
    ]
)

# Run the report
report.run(reference_data=ref_df, current_data=curr_df, column_mapping=column_mapping)

# Show report in Jupyter notebook
report

This is the full error stack generated by the code above:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/IPython/core/formatters.py:347, in BaseFormatter.__call__(self, obj)
    345     method = get_real_method(obj, self.print_method)
    346     if method is not None:
--> 347         return method()
    348     return None
    349 else:

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/suite/base_suite.py:220, in Display._repr_html_(self)
    219 def _repr_html_(self):
--> 220     dashboard_id, dashboard_info, graphs = self._build_dashboard_info()
    221     template_params = TemplateParams(
    222         dashboard_id=dashboard_id,
    223         dashboard_info=dashboard_info,
    224         additional_graphs=graphs,
    225     )
    226     return self._render(inline_iframe_html_template, template_params)

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/report/report.py:231, in Report._build_dashboard_info(self)
    229 # set the color scheme from the report for each render
    230 renderer.color_options = color_options
--> 231 html_info = renderer.render_html(metric)
    232 set_source_fingerprint(html_info, metric)
    233 replace_widgets_ids(html_info, id_generator)

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/metrics/data_drift/column_drift_metric.py:323, in ColumnDriftMetricRenderer.render_html(self, obj)
    322 def render_html(self, obj: ColumnDriftMetric) -> List[BaseWidgetInfo]:
--> 323     result: ColumnDataDriftMetrics = obj.get_result()
    325     if result.drift_detected:
    326         drift = "detected"

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/base_metric.py:253, in Metric.get_result(self)
    251 result = self._context.metric_results.get(self, None)
    252 if isinstance(result, ErrorResult):
--> 253     raise result.exception
    254 if result is None:
    255     raise ValueError(f"No result found for metric {self} of type {type(self).__name__}")

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/calculation_engine/engine.py:71, in Engine.execute_metrics(self, context, data)
     69 logging.debug(f"Executing {type(calculation)}...")
     70 try:
---> 71     calculations[metric] = calculation.calculate(context, converted_data)
     72 except BaseException as ex:
     73     calculations[metric] = ErrorResult(exception=ex)

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/calculation_engine/python_engine.py:102, in PythonEngine.get_metric_implementation.<locals>._Wrapper.calculate(self, context, data)
    101 def calculate(self, context, data: InputData):
--> 102     return self.metric.calculate(data)

File ~/Library/Application Support/pdm/venvs/mlops-poc-model-monitoring-SHBgz1Ca-model-monitoring/lib/python3.11/site-packages/evidently/metrics/data_drift/column_drift_metric.py:287, in ColumnDriftMetric.calculate(self, data)
    285 else:
    286     if self.column_name._feature_class is not None:
--> 287         column_type = self.column_name._feature_class.feature_type
    289 datetime_column = data.data_definition.get_datetime_column()
    290 options = DataDriftOptions(all_features_stattest=self.stattest, threshold=self.stattest_threshold)

AttributeError: 'WordCount' object has no attribute 'feature_type'