google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.25k stars 147 forks source link

Loss Scale for Training Siglip #115

Open lezhang7 opened 3 months ago

lezhang7 commented 3 months ago

Hi,

Thanks for your great work. I was trying to apply siglip loss for training contrastive models. However, I find the loss scale is quiet small, usually around 0.003 at the begging. I wonder if any thing goes wrong in my implementation.


    n = logits.size(0)
    labels = 2 * torch.eye(n) - torch.ones(n, n)  # -1 with diagonal 1
    labels = labels.to(logits.device)
    loss = -torch.mean(F.logsigmoid(labels * logits)) / n    
    return loss```
udion commented 1 month ago

Did you normalize your feature vectors before creating the logits? the feature vectors should be unit vectors (normalize your vectors with L2 norm of the vector)