Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
When a dataquality report is calculated for a dataframe in which there are columns that are mutually exclusive (A is 'None/Null' when B is filled and vice-versa), the cramer_v correlation coefficient crashes. This can somwhat be circumvented in multiple ways (not using the preset and editing your data for the tests that calculate correlation coefficients: filling the None-values, omitting certain columns, etc.) but this is non-optimal and negate the power of how easily presets can be used right now, especially because it becomes impossible to include missingvalue-tests together with correlation-tests in one report.
I think this is relevant part of the traceback after calling report.save_html(...):
File "....venv\lib\site-packages\evidently\metrics\data_quality\dataset_correlations_metric.py", line 171, in _get_correlations
correlations_calculate = calculate_correlations(dataset, data_definition, sum(add_text_columns, []))
File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 404, in calculate_correlations
correlations[kind] = _calculate_correlations(dataset, num_for_corr, cat_for_corr, kind)
File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 392, in _calculate_correlations
return get_pairwise_correlation(df[cat_for_corr], _cramer_v)
File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 365, in get_pairwise_correlation
c = func(df[columns[i]], df[columns[j]])
File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 335, in _cramer_v
chi2_stat = chi2_contingency(arr, correction=False)
File "....venv\lib\site-packages\scipy\stats\contingency.py", line 333, in chi2_contingency
raise ValueError("No data; observed has size 0.")
ValueError: No data; observed has size 0.
When a dataquality report is calculated for a dataframe in which there are columns that are mutually exclusive (A is 'None/Null' when B is filled and vice-versa), the cramer_v correlation coefficient crashes. This can somwhat be circumvented in multiple ways (not using the preset and editing your data for the tests that calculate correlation coefficients: filling the None-values, omitting certain columns, etc.) but this is non-optimal and negate the power of how easily presets can be used right now, especially because it becomes impossible to include missingvalue-tests together with correlation-tests in one report.
I think this is relevant part of the traceback after calling report.save_html(...):
File "....venv\lib\site-packages\evidently\metrics\data_quality\dataset_correlations_metric.py", line 171, in _get_correlations correlations_calculate = calculate_correlations(dataset, data_definition, sum(add_text_columns, [])) File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 404, in calculate_correlations correlations[kind] = _calculate_correlations(dataset, num_for_corr, cat_for_corr, kind) File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 392, in _calculate_correlations return get_pairwise_correlation(df[cat_for_corr], _cramer_v) File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 365, in get_pairwise_correlation c = func(df[columns[i]], df[columns[j]]) File "....venv\lib\site-packages\evidently\calculations\data_quality.py", line 335, in _cramer_v chi2_stat = chi2_contingency(arr, correction=False) File "....venv\lib\site-packages\scipy\stats\contingency.py", line 333, in chi2_contingency raise ValueError("No data;
observed
has size 0.") ValueError: No data;observed
has size 0.