andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Write up "risk factors" analysis #161

Closed andymeneely closed 10 years ago

andymeneely commented 10 years ago

I'm still trying to wrap my head around the queries, but I'd like to show a table in the paper that shows the risk factors for the metrics at a given threshold. Something like this is what I have in mind:

---Number of Reviewers---
mean(vulnerable) = 100
mean(neutral) = 20
Threshold: 60 (avg of the two means)
p(a file is vulnerable | NumReviewers >= 60) = 10%
p(a file is neutral | NumReviewers < 60) = 1%
Thus, having a file with more than 60 reviewers is 10x more likely to have a vulnerability

Some reading:

http://en.wikipedia.org/wiki/Risk_factor http://en.wikipedia.org/wiki/Relative_risk

It's pretty easy to compute from a confusion matrix too. Just need to brush up on my R-fu