Open Jeff-Zilence opened 5 years ago
alpha is 40 and beta is 2. Batchsize is 60-80, and adam with 1e-5 lr.
But performance on CUB is not stable (2% higher or lower is not suprising). The other three datasets is much easier to have the performance as in the paper.
Thank you very much. I reproduced the result of Binomial without mining. I think the problem is the normalization term (1/alpha and 1/beta). These terms are removed in your non-mining version and the accuracy starts dropping when I add these terms. But you use these normalization terms in the mining version. Is that designed for balancing the gradient of positive and negative pairs?
Yes! Exactly that!
Thank you for sharing your code. Nice work! I have a question about the Binomial Deviance Loss. In your previous paper, Binomial without mining achieves recall-top1 of 64% on CUB200, but I can not reproduce the result with your code. Could you provide more detail about your implementation? For example, the alpha and beta.