Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.46k stars 840 forks source link

Suggestion - Use external datasets in notebook examples #367

Open onaly opened 2 years ago

onaly commented 2 years ago

Hello, I have noticed that in your notebook examples for all the algorithms, you seem to use repeatedly datasets that are already available in your package and that can be loaded easily. I think the examples would be far more instructive if datasets from external sources are loaded (a csv for example). It would help to understand how all the objects and classes are used. Thanks in advance.

nrkarthikeyan commented 2 years ago

A notebook example can be created to illustrate how to use a CSV file to create both the "classic" AIF360 StandardDataset as well as the sklearn compatible version using pandas Dataframe

Example for StandardDataset creation: https://github.com/Trusted-AI/AIF360/blob/master/examples/tutorial_bias_advertising.ipynb (see cell 11) Example of a pandas dataframe compatible with sklearn version: https://github.com/Trusted-AI/AIF360/blob/master/examples/sklearn/demo_new_features.ipynb (Note how the protected attributes are in the index for X (dataframe) and y (series)

hoffmansc commented 2 years ago

Is this what you're looking for (see the "Load a custom dataset" section)?

https://github.com/Trusted-AI/AIF360/blob/master/examples/sklearn/monthly_bee_datasets_metrics.ipynb

onaly commented 2 years ago

I am now quite familiar with AIF360. My suggestion was more about the design of the tutorials. Tutorials should be as near as possible to real world scenarios and should give the big picture. When using a mitigation algorithm X, one would load his/her external dataset, preprocess it to have the right format then run the algorithm on it. therefore, a tutorial on X should show all these steps. Also, I would add that all the tutorials would be more instructive if they were showing "negative" patterns. What I mean is that by showing things that will not work or things that will fail (with errors), it helps the user to know how things should be done.

nrkarthikeyan commented 2 years ago

Thanks for the comments. If you would like to update any of the tutorials or create new ones for real world use cases, let us know. We will be happy to help :)

On Thu, Sep 15, 2022 at 8:25 AM Akilhoussen Houzefa Onaly < @.***> wrote:

I am now quite familiar with AIF360. My suggestion was more about the design of the tutorials. Tutorials should be as near as possible to real world scenarios and should give the big picture. When using a mitigation algorithm X, one would load his/her external dataset, preprocess it to have the right format then run the algorithm on it. therefore, a tutorial on X should show all these steps. Also, I would add that all the tutorials would be more instructive if they were showing "negative" patterns. What I mean is that by showing things that will not work or things that will fail (with errors), it helps the user to know how things should be done.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/Trusted-AI/AIF360/issues/367*issuecomment-1248031239__;Iw!!IKRxdwAv5BmarQ!bxAfszpb240QLJyWEcCPeVMbM0Zf94hx7kuwQgz2vGdHAVkX6EpvIDbylNHvO7ibOTgaDutomD3_h4MNiMIBfik$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABNX22SXGSADK64Q2I4PQOTV6MISVANCNFSM6AAAAAAQEYAZ3Q__;!!IKRxdwAv5BmarQ!bxAfszpb240QLJyWEcCPeVMbM0Zf94hx7kuwQgz2vGdHAVkX6EpvIDbylNHvO7ibOTgaDutomD3_h4MNIp5_Nug$ . You are receiving this because you commented.Message ID: @.***>

MiKueen commented 1 year ago

Hi, I would like to create a tutorial using an external dataset.