evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.36k stars 595 forks source link

Profile misses out data in Data Drift json #434

Open shaktiman101 opened 2 years ago

shaktiman101 commented 2 years ago

When using 'Profile' to generate data_drift json, it does not generate feature specific detailed report like 'Share' graph from categorical columns and 'Data Drift' & 'Data distribution' graphs are missing which is there in data_drift Dashboard, when we click on any particular feature dropdown. Sharing the sample code from one of the shared sample notebook file to generate the Data drift json: iris_data_drift_profile = Profile(sections=[DataDriftProfileSection()]) iris_data_drift_profile.calculate(iris_frame[:75], iris_frame[75:], column_mapping=None) iris_data_drift_profile.json() image image

elenasamuylova commented 2 years ago

Hi @shaktiman101!

Thanks for the question!

Btw, we recently changed the API. Instead of a JSON profile as a separate object, we now have a Report object that you can choose to display as a visual report or export as a JSON or Python dictionary.

You can follow the steps in this example notebook: https://github.com/evidentlyai/evidently/blob/main/examples/sample_notebooks/evidently_metric_presets.ipynb

Here is how you get the JSON for the DataDriftPreset:

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(reference_data=adult_ref, current_data=adult_cur)
data_drift_report.json()

Originally we did not include all the data for all the plots to make JSON more lightweight. We might change this in the future.

Could you share what your intended use case is? Where do you plan to use the JSON output - is it a particular visualization tool you want to send it to?

shaktiman101 commented 2 years ago

Hi @elenasamuylova,

Thanks for the reply.

Regarding the new Report object for data drift, I think I looked into this also but this also doesn't have all the details rather I see histogram data has also been removed, I guess this has to do with making the json object lightweight. Could we have some params or some conf file passed to the new Report class & let the user decide what & all data they want in the output json response?

And as for your question regarding the use case, actually we have a tool witch which we want to integrate some of the functionalities of evidently. Having evidently send the json response & visualizing the response within the tools looks more reasonable as it will give seamless user experience rather than integrating the Dashboard/tabs. This is what we think would be the right approach but open for any suggestion, feedback you might have, would be happy to include it.

elenasamuylova commented 2 years ago

Hi @shaktiman101, thanks for the extra details!

We actually plan to update the JSON output (both for Tests and Reports) in the next release. We will review how to best implement it literally over the next few days. Cc @emeli-dral to share more once we define it.