Closed dashaasienga closed 7 months ago
Yes! I was just thinking today how close we are to end of the semester and we should talk about the Ch 1 expectations. We can talk about all this tomorrow when we meet, but I did just want to mention that I uploaded this R script which contains the code I used to create the PPV, NPV, FPR plots. Did it quickly so no comments and disorganized, but I can talk through tomorrow and update it . . .
Yes, let's definitely talk through the R Script and expectations for Chapter 1 + Fall presentations tomorrow!
Update: I was able to reproduce the experimentation, though I still need to clean up/ organize the notebook and understand all the code. Quick question, I don't know how to push the files from the Jupyter HPC to Github and I don't see an option to download to my local computer first, and then push :( I'm wondering if you happen to know how to, and I may also be missing something. We can ask Andy about this as well.
@katcorr
Overview
This week I was mostly focused on thinking about some loose ends and how to tie them up as we move forward as well as filling some gaps in understanding before moving onto the next phase of the thesis work. It's also raised some questions on what the most important things to be thinking about are in the next coming weeks. It's dawning on me that the semester is almost coming to an end! Time is moving so fast!
In the next few weeks, I hope to fully cement my understanding of:
Big Picture Questions
Wrapping up Regression
Experimentation:
I was able to run all the experiments on the HPC and save the results! I'm having trouble, however, analyzing the results and creating the plots the way the researchers did :( I'm working on troubleshooting that and hoping to have that completed by our meeting time.
Application:
I found the GPA data set that the researchers used: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/O35FW8
It may be an interesting result to show the implementation of the Seldonian algorithm in a regression setting since the toy example we used is in a regression setting as well. As we noted, the toy-example was a bit impractical in the real world because we're constraining the MSE to fall within specific bounds. Since the thesis motivation is to use the Seldonian Algorithm to address fairness problems in machine learning, it may be worthwhile to have a practical example showing how this works in a fairness context before we move onto classification. We may be able to show some of the conflicts as well. We can look into this together and think more about that.
This link may also offer some useful resources on how to do this, particularly item D: https://seldonian.cs.umass.edu/Tutorials/tutorials/
Specific Questions
Thanks for your explanation on the formula on fairness conflicts! Though there are a lot of pieces to keep track of, it was very intuitive and easy to follow and it's definitely a very important result! This is something we can illustrate with the GPA data set and the COMPAS data set. I was wondering, how did you create the plots? They really helped to explain the concepts.
I'd like to re-visit this since we didn't get to it last time:
For context,
Note that S here refers to the probability score/ prediction (i.e. the probability that the observations fall in the positive class).
There's a lot more we can discuss regarding these 3 categories of fairness definitions. Key literature source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8913820/
Key Next Steps
Once this is wrapped up, we can move on to the classification setting where I envision we'll run the main simulations and main application with the COMPAS data set.