Thanks for your great work. I was trying to apply siglip loss for training contrastive models. However, I find the loss scale is quiet small, usually around 0.003 at the begging. I wonder if any thing goes wrong in my implementation.
n = logits.size(0)
labels = 2 * torch.eye(n) - torch.ones(n, n) # -1 with diagonal 1
labels = labels.to(logits.device)
loss = -torch.mean(F.logsigmoid(labels * logits)) / n
return loss```
Did you normalize your feature vectors before creating the logits? the feature vectors should be unit vectors (normalize your vectors with L2 norm of the vector)
Hi,
Thanks for your great work. I was trying to apply siglip loss for training contrastive models. However, I find the loss scale is quiet small, usually around 0.003 at the begging. I wonder if any thing goes wrong in my implementation.