Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.47k stars 842 forks source link

Memory issues while opening StandardDataset #74

Open nathesh opened 5 years ago

nathesh commented 5 years ago
import pandas as pd
import sys
import numpy as np
np.random.seed(0)
from aif360.datasets import StructuredDataset as SD
from aif360.datasets import BinaryLabelDataset as BLD
from aif360.metrics import ClassificationMetric as CM
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing
from sklearn.ensemble import RandomForestClassifier as RF
from sklearn.datasets import make_classification as mc 
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

data, label = mc(n_samples=10000,n_features=30)
bias_feature = label.copy()
np.random.shuffle(bias_feature)
agg_data = np.hstack([data,  bias_feature.reshape(-1,1), label.reshape(-1,1),])
pd_data = pd.DataFrame(agg_data, columns=list(range(1,31)) + ["gender", "labels"])
dataset = BLD(favorable_label=0, unfavorable_label=1,df=pd_data,
              label_names=["labels"], protected_attribute_names=["gender"], 
              privileged_protected_attributes=[2])

running BLD(favorable_label=0, unfavorable_label=1,df=pd_data, label_names=["labels"], protected_attribute_names=["gender"], privileged_protected_attributes=[2]) in a python jupyter notebook 3 times runs in a memoryerror

nathesh commented 5 years ago

referenced here --> https://github.com/aif360-learn/aif360-learn/issues/44