RozDavid / LanguageGroundedSemseg

Implementation for ECCV 2022 paper Language-Grounded Indoor 3D Semantic Segmentation in the Wild
98 stars 14 forks source link

Text-supervised Contrastive optimization. #9

Closed yhyang-myron closed 1 year ago

yhyang-myron commented 1 year ago

Hi, Could I ask why did you choose this loss as your pre-training step? Did you try some other loss for the pre-training step, such as InfoNCEloss? What's the difference between them?

RozDavid commented 1 year ago

Hey @yhyang-myron,

The intuition behind InfoNCELoss and ours is the same - trying to find positive and negative samples, where we use some sort of distance metric to get similarity scores between the anchors and our query features. The difference is mainly implementation wise, as 1) our implementation for this kind of loss was more allowing for weighting the effect of positive and negative samples, 2) use different distance types and also 3) have a more memory efficient sampling then the standard InfoNCE implementation, BUT we don't apply cross entropy as a logit normalization with the positive sample as a target, but directly use the distance metric as a loss value trying to achieve maximum separation instead of maximum similarity to the anchors.

Running our loss function with cosine distance, even wighting for pos and negative and using all class anchors for negative samples should lead to the same results in theory, though we haven't explicitly tested it. Seems to be an interesting experiment if you would like to try it!

Hope this answers your question!

Kind regards, David