Open davidmalcolm opened 1 year ago
The Juliet results tend to dwarf those of the other results and break my result-handling scripts.
Also, for Juliet we have an oracle of known results, so we can figure out True Positive vs False Positive vs True Negative vs False Negative.
Given that we should special-case Juliet, and compute Youden's J statistic aka Youden's index for the Juliet tests; see e.g.: https://owasp.org/www-project-benchmark/#div-scoring https://en.wikipedia.org/wiki/Youden%27s_J_statistic https://en.wikipedia.org/wiki/Receiver_operating_characteristic
The Juliet results tend to dwarf those of the other results and break my result-handling scripts.
Also, for Juliet we have an oracle of known results, so we can figure out True Positive vs False Positive vs True Negative vs False Negative.
Given that we should special-case Juliet, and compute Youden's J statistic aka Youden's index for the Juliet tests; see e.g.: https://owasp.org/www-project-benchmark/#div-scoring https://en.wikipedia.org/wiki/Youden%27s_J_statistic https://en.wikipedia.org/wiki/Receiver_operating_characteristic