Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

Compas Dataset Raises ValueError: could not convert string to float: 'Other' #435

Closed MrMadium closed 1 year ago

MrMadium commented 1 year ago

AIF Version 0.5.0 Pandas: 1.3.5 Python: 3.8.10

Note on "Other" - it is one of the race categories.

from aif360.datasets import CompasDataset

aif_df = CompasDataset(label_name='is_recid',
                       favorable_classes=[0],
                       protected_attribute_names=['sex', 'race'],
                       privileged_classes=[['Female', 'Caucasian']],
                       categorical_features=['age_cat', 'c_charge_degree', 'score_text'],
                       features_to_keep=['age', 'age_cat', 'sex', 'race', 'c_charge_degree', 'priors_count', 'days_b_screening_arrest', 'decile_score', 'score_text', 'is_recid', 'two_year_recid', 'c_jail_in', 'c_jail_out'])

ValueError: could not convert string to float: 'Other'

ValueError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/aif360/datasets/structured_dataset.py in init(self, df, label_names, protected_attribute_names, instance_weights_name, scores_names, unprivileged_protected_attributes, privileged_protected_attributes, metadata) 95 try: ---> 96 df = df.astype(np.float64) 97 except ValueError as e:

12 frames ValueError: could not convert string to float: 'Other'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/aif360/datasets/structured_dataset.py in init(self, df, label_names, protected_attribute_names, instance_weights_name, scores_names, unprivileged_protected_attributes, privileged_protected_attributes, metadata) 97 except ValueError as e: 98 print("ValueError: {}".format(e)) ---> 99 raise ValueError("DataFrame values must be numerical.") 100 101 # Convert all column names to strings

ValueError: DataFrame values must be numerical.

MrMadium commented 1 year ago

Had to cast the column in the 'categorical_features' column by the look of it. Apologies.