ashryaagr / Fairness.jl

Julia Toolkit with fairness metrics and bias mitigation algorithms
https://ashryaagr.github.io/Fairness.jl/dev/
MIT License
31 stars 14 forks source link

How to contribute? #3

Open Faldict opened 4 years ago

Faldict commented 4 years ago

Hello,

I find it is really an interesting project, and I would like to make some contribution. What can I start with?

ashryaagr commented 4 years ago

Glad to hear that you found the project interesting. The project is currently being developed by me and my mentors under the JSOC program by Julia Computing. After confirmation from my mentors, I will let you know shortly whether we can have external open-source contributions as well before our first release. Thanks!

ashryaagr commented 4 years ago

I have confirmed. We can have external open-source contributions :-)

Before starting with any contribution I recommend that you first go through the documentation and understand the design of the package and concepts like fairness tensor and wrappers.

Possible contributions can be

These are a few possible things I could think of. There might be even more. Feel free to discuss in case of any suggestions or feedback or issues. I am available on Julia slack workspace as "Ashrya Agrawal". You can join the workspace using https://slackinvite.julialang.org/

vollmersj commented 4 years ago

@Faldict thanks for reaching out - above is quite comprehensive - let us know where you interst and strength lie and we can work something out - happy to jump on a call. Plots are great: Aequitas is doing a great job at this

Faldict commented 4 years ago

Thanks for your response! I think I could start with adding the fairness datasets. My question here is, why does the dataset macro return the tuple (X, Y, Y_hat). If I understand correctly, the Y_hat is the prediction and it may need training on the dataset. Why not return the sensitive attributes directly?

ashryaagr commented 4 years ago

The macro you are talking about is toy-data with only 10 rows. It returns (X, y, ŷ) just to enable users to try out various things like metrics, etc without fitting an algorithm and predicting.

But while adding macros for real datasets like COMPAS, German, Adult, etc. we would not need the macro to return ŷ. So we can normally return (X, y). It is going to be very similar to macros available at https://github.com/alan-turing-institute/MLJBase.jl/blob/master/src/data/datasets.jl#L200 . Let me know if you need further clarification on this.

Faldict commented 4 years ago

Thanks for your clarification. I have added the COMPAS and Adult datasets. Do I need to write the test scripts for them?

Another question is that when I am install the package for testing, I meet the following errors:

┌ Warning: julia version requirement for package MLJFlux not satisfied
└ @ Pkg.Operations /Users/sabae/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.2/Pkg/src/Operations.jl:225
ERROR: Unsatisfiable requirements detected for package Flux [587475ba]:
 Flux [587475ba] log:
 ├─possible versions are: [0.4.1, 0.5.0-0.5.4, 0.6.0-0.6.10, 0.7.0-0.7.3, 0.8.0-0.8.3, 0.9.0, 0.10.0-0.10.4, 0.11.0] or uninstalled
 ├─restricted to versions 0.10.4-0.10 by MLJFlux [094fc8d1], leaving only versions 0.10.4
 │ └─MLJFlux [094fc8d1] log:
 │   ├─possible versions are: 0.1.2 or uninstalled
 │   └─MLJFlux [094fc8d1] is fixed to version 0.1.2
 └─restricted to versions 0.10.3 by an explicit requirement — no versions left

so that I am not able to install the package.

ashryaagr commented 4 years ago

Thanks a lot for working on the dataset macros. It would be great if you could write the tests (/tests/datasets/datasets.jl) as well for the datasets you add. Another comment on your commit 1111e96e27e21904606e7ef103c2d47513e1920a : It would be better to download the datasets only when required. So, when the macro is called, we can check whether the data directory contains the dataset. If the directory does not have the dataset, it is then downloaded from the specified link. I will add an example macro and corresponding test for some other fairness dataset for your reference.

I am not sure why this version incompatibility issue is coming on your system. But this MLJFlux package is not required for the package. In the commit 9dee3303212ab3a3b2d4834cdabbf7a3d3dce741 I have removed the inessential packages like MLJFlux from the dependencies. Please let me know if you still face any setup issues after pulling the changes.

ashryaagr commented 4 years ago

@Faldict you might want to look at the macro I have added for German credit data : https://github.com/ashryaagr/MLJFair.jl/blob/master/src/datasets/datasets.jl Corresponding Tests are available at https://github.com/ashryaagr/MLJFair.jl/blob/master/test/datasets/datasets.jl

I hope these make it easier for you to add the macros and tests for other fairness datasets.

Faldict commented 4 years ago

@ashryaagr Thanks a lot! I have fixed this problem.