freedomofpress / fingerprint-securedrop

A machine learning data analysis pipeline for analyzing website fingerprinting attacks and defenses.
GNU Affero General Public License v3.0
29 stars 9 forks source link

Generate plot of precision/recall as a function of k #62

Open redshiftzero opened 7 years ago

redshiftzero commented 7 years ago

We want to know in a realistic scenario - i.e. one that incorporates the effect of the class imbalance - how effective these attacks are in terms of true and false positives. A really nice plot that would show this (right now the machine learning pipeline generates only an ROC curve) is a graph of precision and recall as a function of k, the percent of the ranked list flagged. Let's add this to evaluate.py.

Also: see Figure 5 in this paper to see a nice comparison between ROC curves and precision/recall graphs in the presence of different base rates.

psivesely commented 7 years ago

percent of the ranked list flagged

Can you clarify what you mean a bit here? What is the ranked list? Flagged as in the classifier believes it's a SD?

psivesely commented 7 years ago

We want to know in a realistic scenario - i.e. one that incorporates the effect of the class imbalance - how effective these attacks are in terms of true and false positives.

TPR and FPR are metrics independent of class balance. I know you know this, but it appears to be improperly or at least confusingly phrased.