Text-supervised Contrastive optimization.

Hey @yhyang-myron,

The intuition behind InfoNCELoss and ours is the same - trying to find positive and negative samples, where we use some sort of distance metric to get similarity scores between the anchors and our query features. The difference is mainly implementation wise, as 1) our implementation for this kind of loss was more allowing for weighting the effect of positive and negative samples, 2) use different distance types and also 3) have a more memory efficient sampling then the standard InfoNCE implementation, BUT we don't apply cross entropy as a logit normalization with the positive sample as a target, but directly use the distance metric as a loss value trying to achieve maximum separation instead of maximum similarity to the anchors.

Running our loss function with cosine distance, even wighting for pos and negative and using all class anchors for negative samples should lead to the same results in theory, though we haven't explicitly tested it. Seems to be an interesting experiment if you would like to try it!

Hope this answers your question!

Kind regards, David

RozDavid / LanguageGroundedSemseg

Text-supervised Contrastive optimization. #9