HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
578 stars 164 forks source link

Calculating EER #49

Open nidhal1231 opened 5 years ago

nidhal1231 commented 5 years ago

I can't understand well the process of calculating EER . I would be greatful if someone could explain this to me .

cbrochtrup commented 5 years ago

EER is a performance metric for binary classification systems that output class likelihoods. It corresponds to the amount mislabeled samples given equal (or as close to equal as possible) false negatives and false positives. This corresponds to the point in the DET curve where the false positive and miss detection ( false negative ) are equal.

For a different visualization read this.

nidhal1231 commented 5 years ago

@cbrochtrup Thanks a lot

chrisspen commented 5 years ago

@cbrochtrup That link basically says the EER is a worthless metric for any real work. Is that what you meant to link to?

"For starters, we can dismiss the equal error rate (EER). In all my research, I have yet to encounter a use case where having an equal probability of false accept or false reject was optimal. EER is an interesting academic construct but it’s hard to think of a use case where it matters."

cbrochtrup commented 5 years ago

Haha, I can see why you'd think that. My apologies! I don't think EER is useless. EER is the standard metric to compare system performance in the speaker verification literature.

Why does the writer think EER is useless? Because EER is not important in many industry applications such as medical diagnosis or biometric security. For example, if you're building a system to detect cancer you don't want to have equal errors - which is what the EER would give. For detecting cancer you're okay with false alarms but false negatives are dangerous because missing a cancer diagnosis (false negative) could cost a patient their life. The writer is likely a researcher in a field where EER is not the standard metric.

In summary, EER is used in research to compare system performance but in practice, you would change the detection threshold to reduce the most costly type of system errors. Does that make sense? Do you know what I mean by "threshold"?