Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.43k stars 833 forks source link

Refactor openml_datasets.py #375

Open anupamamurthi opened 2 years ago

anupamamurthi commented 2 years ago

https://github.com/Trusted-AI/AIF360/blob/master/aif360/sklearn/datasets/openml_datasets.py

Add a wrapper around this module so that datasets can be directly accessed using the wrapper

Instead of doing something like this,

from sklearn.datasets import fetch_openml

from aif360.sklearn.datasets.utils import standardize_dataset

# cache location
DATA_HOME_DEFAULT = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                                 '..', 'data', 'raw')

def fetch_adult(subset='all', *, data_home=None, cache=True, binary_race=True,
                usecols=None, dropcols=None, numeric_only=False, dropna=True):
    if subset not in {'train', 'test', 'all'}:
        raise ValueError("subset must be either 'train', 'test', or 'all'; "
                         "cannot be {}".format(subset))
    df = fetch_openml(data_id=1590, data_home=data_home or DATA_HOME_DEFAULT,
                      cache=cache, as_frame=True).frame

the proposal is to have an OpenMLStore

class OpenMLStore(ABC): @abc.abstractmethod def init(self, **kwargs): pass

def download(self, data_id, data_home):
     df = fetch_openml(data_id=1590, data_home=data_home or DATA_HOME_DEFAULT,
                  cache=cache, as_frame=True).frame

// decide on returning a DF or just the o/p directory location

def upload(self, **kwargs):
    pass

And fetch_adult() function can be updated to use OpenMLStore abstraction

hoffmansc commented 2 years ago

Can you elaborate on the shortcomings of the current method?

yoshimii commented 1 year ago

Hello, I'd like to work on this issue.

jainsunishka commented 1 year ago

I am making progress on this issue, and I would like to continue on this.

vandanapathare commented 1 year ago

Hello I would like to work on this issue

jainsunishka commented 1 year ago

Hello I would like to work on this issue

Hey @vandanapathare. I have already raised the PR and finishing up on my code review.