Walleclipse / Deep_Speaker-speaker_recognition_system

Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition)
246 stars 81 forks source link

hard-negative mining #11

Closed mangushev closed 5 years ago

mangushev commented 5 years ago

Hi, I see in the article for FaceNet. https://blog.csdn.net/baidu_27643275/article/details/79222206 They select all positives, but from negatives that satisfy criteria, they pick at random from the set instead of hardest negatives only. This feels like more representative picking of samples instead of picking only the hardest. Any views on this. Thanks!

Walleclipse commented 5 years ago

Hi, I think, in the early step of training, it is crucial to select negative samples randomly. Because of,

  1. random negative samples feed model more sample, that is helpful for generalization.
  2. It is very hard to directly learn that "hard cases" for the untrained model in early step.

When the performance of the model is not improved, we consider selecting sampling. Because of, Almost all data is learned effectively except the few hard one. For the a lot of samples, anchor-positive similarity (sap) greater than anchor-negative similarity (san), that cause the loss = |san -sap + alpha| ~0 . The model can no longer be trained effectively. So we can choose the samples that makes loss >0 .

mangushev commented 5 years ago

Thanks! That clarifies.