Open onaly opened 2 years ago
A notebook example can be created to illustrate how to use a CSV file to create both the "classic" AIF360 StandardDataset as well as the sklearn compatible version using pandas Dataframe
Example for StandardDataset creation: https://github.com/Trusted-AI/AIF360/blob/master/examples/tutorial_bias_advertising.ipynb (see cell 11) Example of a pandas dataframe compatible with sklearn version: https://github.com/Trusted-AI/AIF360/blob/master/examples/sklearn/demo_new_features.ipynb (Note how the protected attributes are in the index for X (dataframe) and y (series)
Is this what you're looking for (see the "Load a custom dataset" section)?
https://github.com/Trusted-AI/AIF360/blob/master/examples/sklearn/monthly_bee_datasets_metrics.ipynb
I am now quite familiar with AIF360. My suggestion was more about the design of the tutorials. Tutorials should be as near as possible to real world scenarios and should give the big picture. When using a mitigation algorithm X, one would load his/her external dataset, preprocess it to have the right format then run the algorithm on it. therefore, a tutorial on X should show all these steps. Also, I would add that all the tutorials would be more instructive if they were showing "negative" patterns. What I mean is that by showing things that will not work or things that will fail (with errors), it helps the user to know how things should be done.
Thanks for the comments. If you would like to update any of the tutorials or create new ones for real world use cases, let us know. We will be happy to help :)
On Thu, Sep 15, 2022 at 8:25 AM Akilhoussen Houzefa Onaly < @.***> wrote:
I am now quite familiar with AIF360. My suggestion was more about the design of the tutorials. Tutorials should be as near as possible to real world scenarios and should give the big picture. When using a mitigation algorithm X, one would load his/her external dataset, preprocess it to have the right format then run the algorithm on it. therefore, a tutorial on X should show all these steps. Also, I would add that all the tutorials would be more instructive if they were showing "negative" patterns. What I mean is that by showing things that will not work or things that will fail (with errors), it helps the user to know how things should be done.
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/Trusted-AI/AIF360/issues/367*issuecomment-1248031239__;Iw!!IKRxdwAv5BmarQ!bxAfszpb240QLJyWEcCPeVMbM0Zf94hx7kuwQgz2vGdHAVkX6EpvIDbylNHvO7ibOTgaDutomD3_h4MNiMIBfik$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABNX22SXGSADK64Q2I4PQOTV6MISVANCNFSM6AAAAAAQEYAZ3Q__;!!IKRxdwAv5BmarQ!bxAfszpb240QLJyWEcCPeVMbM0Zf94hx7kuwQgz2vGdHAVkX6EpvIDbylNHvO7ibOTgaDutomD3_h4MNIp5_Nug$ . You are receiving this because you commented.Message ID: @.***>
Hi, I would like to create a tutorial using an external dataset.
Hello, I have noticed that in your notebook examples for all the algorithms, you seem to use repeatedly datasets that are already available in your package and that can be loaded easily. I think the examples would be far more instructive if datasets from external sources are loaded (a csv for example). It would help to understand how all the objects and classes are used. Thanks in advance.