Open jeric250 opened 6 months ago
Hi @jeric250, could you try to run pd.to_numeric
on your input columns?
Thanks @elenasamuylova for responding so quickly. Forgot to mention, I did try pd.to_numeric
as well, something like:
ref_df = ref_df.apply(pd.to_numeric, errors='coerce')
However, the same error still occurred. There's also no null values in the dataset as well.
When I tried to test on a single numerical column, I get the same error as well.
# test on AGE column, represent age of people (e.g. 32, 40)
data_drift_column_report = Report(metrics=[
ColumnDriftMetric('AGE'),
ColumnValuePlot('AGE'),
])
data_drift_column_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_column_report
Error:
UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U14'), dtype('float64')) -> None
Same error when I tried DataDriftTable:
data_drift_dataset_report = Report(metrics=[
DataDriftTable(num_stattest='wasserstein', cat_stattest='psi'),
])
data_drift_dataset_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_dataset_report
When I limit DataDriftTable to just categorical columns, it works fine with a report generated.
@jeric250
I found out that the UFuncTypeError
when using evidently.ai is oddly related to the index of the dataframes passed as reference_data or current_data. If your dataframes have a named index, it will cause the error: "UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U20'), dtype('float64')) -> None"
Solution: To address this, remove (drop) the index from the dataframe:
x = df.copy()
x.reset_index(drop=True, inplace=True) # <- remove index
report = Report(metrics=[ColumnDriftMetric(column_name="premium")]) # 'premium' is an arbitrary feature in my dataset
report.run(reference_data=x, current_data=x) # <- note: you should set reference_data and current_data accordingly
report
Hope this helps!
Hi there, first time opening an issue so bear with me (and let me know if more info is needed).
Basic information: Package version used: 0.4.20 Operating system and version: macOS VSCode Programming language and version used: Python 3.12.2
Code snippet:
The above code is based on Evidently documentation: https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_specify_stattest_for_a_testsuite.ipynb
Error message:
The above code snippet takes in only numerical data in a pandas DataFrame (data type of 'float64', 'int64'). When I use the exact same code for only categorical data (data type of 'object','category'), the above code works fine with a report generated.
I checked whether the numerical data used contain any weird values, and it doesn't seem to be the case. For example, to find records with non-numeric values:
ref_df[~ref_df.applymap(np.isreal).all(1)]
What am I missing? Any advice?