Open jeric250 opened 1 month ago
Hi @jeric250, could you try to run pd.to_numeric
on your input columns?
Thanks @elenasamuylova for responding so quickly. Forgot to mention, I did try pd.to_numeric
as well, something like:
ref_df = ref_df.apply(pd.to_numeric, errors='coerce')
However, the same error still occurred. There's also no null values in the dataset as well.
When I tried to test on a single numerical column, I get the same error as well.
# test on AGE column, represent age of people (e.g. 32, 40)
data_drift_column_report = Report(metrics=[
ColumnDriftMetric('AGE'),
ColumnValuePlot('AGE'),
])
data_drift_column_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_column_report
Error:
UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U14'), dtype('float64')) -> None
Same error when I tried DataDriftTable:
data_drift_dataset_report = Report(metrics=[
DataDriftTable(num_stattest='wasserstein', cat_stattest='psi'),
])
data_drift_dataset_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_dataset_report
When I limit DataDriftTable to just categorical columns, it works fine with a report generated.
Hi there, first time opening an issue so bear with me (and let me know if more info is needed).
Basic information: Package version used: 0.4.20 Operating system and version: macOS VSCode Programming language and version used: Python 3.12.2
Code snippet:
The above code is based on Evidently documentation: https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_specify_stattest_for_a_testsuite.ipynb
Error message:
The above code snippet takes in only numerical data in a pandas DataFrame (data type of 'float64', 'int64'). When I use the exact same code for only categorical data (data type of 'object','category'), the above code works fine with a report generated.
I checked whether the numerical data used contain any weird values, and it doesn't seem to be the case. For example, to find records with non-numeric values:
ref_df[~ref_df.applymap(np.isreal).all(1)]
What am I missing? Any advice?