dashaasienga / Statistics-Senior-Honors-Thesis

0 stars 0 forks source link

Thesis Application: COMPAS Analysis #30

Closed dashaasienga closed 3 months ago

dashaasienga commented 4 months ago

@katcorr

I hope your weekend is going great so far!

I've been working on the full COMPAS data wrangling and analysis in a separate file (see https://github.com/dashaasienga/Statistics-Senior-Honors-Thesis/blob/main/R/COMPAS-Analysis.pdf). It's incomplete, but I've made some progress. I expect to keep working on it before our meeting on Tuesday and during the week, but you are welcome to review what I've done so far any time between now and Tuesday! The purpose of this file is to have all the COMPAS analysis in one place, and then we can pick the most important results to include in the actual body of the thesis. I write in the first person since I am walking through the analysis process step by step, but that won't be the language of the thesis. It's more like a technical appendix. We can also discuss on Tuesday if there is anything more that you think would be useful to add or edit, but I think it's coming together pretty nicely so far! My goal is to, at the very least, have most of the analysis complete by Friday so that I can spend the weekend writing up a proper draft of the application chapter by our next meeting on Tuesday, March 5th.

FYI, I updated the SQL script and re-pulled the data from the database in order to select only the variables that seem most important, display the variables in a way that makes more sense to me, avoid duplicate and unnecessary columns, and add some information from other tables. The data is so much cleaner that way, and one of the variables I added, days_in_jail, seems to actually be one of the most important ones based on my analysis. I then wrangle the data further in the rmd file before proceeding with the analysis. See #28 for the updated SQL script and documentation that I also pushed to Git.

See you on Tuesday!

dashaasienga commented 4 months ago

@katcorr

My apologies for how late this issue update is coming. Things have just been taking so much longer than I anticipated, which has been quite frustrating since it seems like time is moving so much faster these days, but I am so relieved that I finally have a Seldonian solution, together with the predictive values. I have also recreated the tables for logistic regression as well as the Seldonian solution! We can discuss those tomorrow and adjust as needed, but I'm glad I at least have a solution and a workflow that I can easily adjust. See https://github.com/dashaasienga/Statistics-Senior-Honors-Thesis/blob/main/R/COMPAS-Analysis.pdf.

It seems I'm running ~1 week behind on my plan (https://github.com/dashaasienga/Statistics-Senior-Honors-Thesis/issues/25) because I anticipate having the Chapter 3 draft complete by the end of this week or weekend (instead of last week as planned). I'll also have the first part of Chapter 4, which I had already started writing and you had reviewed, completed and edited before our next Tuesday meeting (good thing I already have a bit of a head start on that!). This will leave the simulation code and simulation results pending as of early next week, if all goes according to plan. I'm hoping to have a decent draft of the simulation code by the time Spring break starts so I'm not too off-schedule.

See you tomorrow!

dashaasienga commented 4 months ago

@katcorr

I've finished incorporating $\epsilon = 0.2, 0.1, 0.05, 0.01$ (see https://github.com/dashaasienga/Statistics-Senior-Honors-Thesis/blob/main/R/COMPAS-Analysis.pdf).

Super interesting results! I've put the snapshot tables below(using a 50% threshold but I've emailed Phil to clarify about this just in case)! The tradeoff is the FNR, it gets worse and worse, with the highest constraint having 100% FNR by classifying all the observations in the negative class and getting an accuracy of 64% (only 6% lower than logistic regression).

$\epsilon = 0.2$

Screen Shot 2024-03-06 at 12 52 20

$\epsilon = 0.1$

Screen Shot 2024-03-06 at 12 53 18

$\epsilon = 0.05$

Screen Shot 2024-03-06 at 12 52 44

$\epsilon = 0.01$

Screen Shot 2024-03-06 at 12 53 37
katcorr commented 4 months ago

Ah that is interesting. Excellent work summarizing these already, and I'm glad you emailed Phil to confirm about the threshold.