evidentlyai / evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
https://www.evidentlyai.com/evidently-oss
Apache License 2.0
5.47k stars 604 forks source link

Data quality test suite saved as HTML is much bigger than data quality preset metric report (300MB vs. 3MB) #1092

Open billlyzhaoyh opened 7 months ago

billlyzhaoyh commented 7 months ago

The two files in the screenshot are generated with the code below:

print("Generating data quality report...")
data_quality_report = Report(metrics=[
    DataQualityPreset(),
])
data_quality_report.run(reference_data=df, current_data=df, column_mapping=data_column_mapping)
data_quality_report.save_html(
    os.path.join(data_profile_dir, "data_quality.html")
)
print("Data quality report generated successfully!")
print("Running data quality test suite...")
data_quality_test_suite = TestSuite(tests=[
    DataDriftTestPreset(),
    DataQualityTestPreset(),
    DataStabilityTestPreset(),
])
data_quality_test_suite.run(reference_data=df, current_data=df, column_mapping=data_column_mapping)
data_quality_test_suite.save_html(
    os.path.join(data_profile_dir, "data_quality_test.html")
)
print("Data quality test suite generated successfully!")
Screenshot 2024-05-02 at 17 15 05

What can I do to shrink the size of the HTML output from the test suite?

elenasamuylova commented 7 months ago

Hi @billlyzhaoyh,

In the second instance (when you combine multiple Test Presets), you generate a very large number of column-level tests, compared to the first instance (where DataQualityPreset() generates summaries for all columns only once).

Many of these individual Tests have a visual render (e.g., distribution of each column), increasing the resulting HTML's size.

The solution is to create a custom Test Suite that includes the individual Tests you'd like to see, instead of combining Test Presets. https://docs.evidentlyai.com/user-guide/tests-and-reports/custom-test-suite

billlyzhaoyh commented 7 months ago

Thank you for this @elenasamuylova I was trying to look up but is there any way that we can disable visual render functionality in favour of a smaller HTML?

elenasamuylova commented 7 months ago

Hi @billlyzhaoyh, I am afraid there is no such feature currently. However, you can export the results as a JSON or Python dictionary instead: https://docs.evidentlyai.com/user-guide/tests-and-reports/run-tests#output-formats