evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.09k stars 570 forks source link

Tabular Data Drift By Binary Classifier #706

Open Miriam2040 opened 1 year ago

Miriam2040 commented 1 year ago

Hi,

I saw that for text & embedding there is binary classification method: "Evidently trains a binary classification model to discriminate between data from reference and current distribution"

I want to use same method for tabular data but didn't see it. Is it supported for tabular? If not how can I implement one?

and general question about data drift, can it be run as test in test suite or just report?

Thanks!

elenasamuylova commented 1 year ago

Hi @Miriam2040,

Classifier drift detection method

The classifier method is available for embeddings (docs here https://docs.evidentlyai.com/user-guide/customization/embeddings-drift-parameters) and for text data (docs here https://docs.evidentlyai.com/user-guide/customization/options-for-statistical-tests). For text data, it includes text-specific pre-processing.

It is not implemented for tabular data.

If you want to pass a custom drift detection method, here is the explanation of how to pass a custom function: https://docs.evidentlyai.com/user-guide/customization/add-custom-drift-method

Data drift test

Yes, you can use data drift detection as a Test Suite. There is a test preset (DataDriftTestPreset()), and separate tests you can choose from: TestNumberOfDriftedColumns(), TestShareOfDriftedColumns() for the dataset and TestColumnDrift(column_name='name') for individual columns.

You can see them in the example notebooks linked here https://docs.evidentlyai.com/examples

Or in the all tests list https://docs.evidentlyai.com/reference/all-tests#data-drift

Miriam2040 commented 1 year ago

Great, thanks!