eq(3) in paper, target data similarity is calculated by cosine, but here is euclidean ? the default metric of pdist is euclidean.
the cosine similarity calculated here is not a common way likes F.normalize in pytorch, why? what's the concern?
the cosine similarity is finally inflated by a scale of 30, why?
eq(6) in paper, the mean and std of soft multilabels is updated by moving average of weight 0.5 as described in supplementary, but here in code uses batch size / 10000, why?
when norm=1 cos similarity is equivalent to L2 distance when used as a distance metric (it is a linear transform of L2 distance).
it seems natural to me, though. when norm=1 cos similarity can be simply implemented by matrix multiplication, which is efficient.
basically it is because softmax loss has a bound corresponding to the feature norm. for more details please refer to the Normface paper in the reference.
yeah I think this is a version issue.. thanks for letting me know! I shall update the supp.
Thanks for your work