Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Suppose I don't want current_small_hist and ref_small_hist in my output. I should be having an option to skip the calculatios for these fields as it takes longer to generate the data drift output . Consider the following code snippet -
ref_counts = feature_ref_data.value_counts(sort=False)
cur_counts = feature_cur_data.value_counts(sort=False)
keys = set(ref_counts.keys()).union(set(cur_counts.keys()))
for key in keys:
if key not in ref_counts:
ref_counts.loc[key] = 0
if key not in cur_counts:
cur_counts.loc[key] = 0
For a high cardinality categorical feature having thousands of categories (e.g. zipcode, ip address etc.), this loop takes longer to get the count of all keys for reference and current data which in turn delays the process. It is just used to calculate current_small_hist and ref_small_hist in data drift profile output. Therefore we should have an option to skip this calculation.
There should be an option to choose what we want as data drift profile result. At present, we get the following fields for each feature -
Suppose I don't want
current_small_hist
andref_small_hist
in my output. I should be having an option to skip the calculatios for these fields as it takes longer to generate the data drift output . Consider the following code snippet -https://github.com/evidentlyai/evidently/blob/0f279d3d908b20d6df47b88bff8800bbdf3d516e/src/evidently/calculations/data_drift.py#L281
For a high cardinality categorical feature having thousands of categories (e.g. zipcode, ip address etc.), this loop takes longer to get the count of all keys for reference and current data which in turn delays the process. It is just used to calculate
current_small_hist
andref_small_hist
in data drift profile output. Therefore we should have an option to skip this calculation.