Generate plot of precision/recall as a function of k

redshiftzero commented 7 years ago

We want to know in a realistic scenario - i.e. one that incorporates the effect of the class imbalance - how effective these attacks are in terms of true and false positives. A really nice plot that would show this (right now the machine learning pipeline generates only an ROC curve) is a graph of precision and recall as a function of k, the percent of the ranked list flagged. Let's add this to evaluate.py.

Also: see Figure 5 in this paper to see a nice comparison between ROC curves and precision/recall graphs in the presence of different base rates.

psivesely commented 7 years ago

percent of the ranked list flagged

Can you clarify what you mean a bit here? What is the ranked list? Flagged as in the classifier believes it's a SD?

psivesely commented 7 years ago

We want to know in a realistic scenario - i.e. one that incorporates the effect of the class imbalance - how effective these attacks are in terms of true and false positives.

TPR and FPR are metrics independent of class balance. I know you know this, but it appears to be improperly or at least confusingly phrased.

freedomofpress / fingerprint-securedrop

Generate plot of precision/recall as a function of k #62