I'm still trying to wrap my head around the queries, but I'd like to show a table in the paper that shows the risk factors for the metrics at a given threshold. Something like this is what I have in mind:
---Number of Reviewers---
mean(vulnerable) = 100
mean(neutral) = 20
Threshold: 60 (avg of the two means)
p(a file is vulnerable | NumReviewers >= 60) = 10%
p(a file is neutral | NumReviewers < 60) = 1%
Thus, having a file with more than 60 reviewers is 10x more likely to have a vulnerability
I'm still trying to wrap my head around the queries, but I'd like to show a table in the paper that shows the risk factors for the metrics at a given threshold. Something like this is what I have in mind:
Some reading:
http://en.wikipedia.org/wiki/Risk_factor http://en.wikipedia.org/wiki/Relative_risk
It's pretty easy to compute from a confusion matrix too. Just need to brush up on my R-fu