LAION-AI / CLIP_benchmark

CLIP-like model evaluation
MIT License
535 stars 68 forks source link

Performs normalization of the text embedding twice #90

Closed Adamdad closed 11 months ago

Adamdad commented 1 year ago

The zeroshot_classification.py script includes code (https://github.com/LAION-AI/CLIP_benchmark/blob/main/clip_benchmark/metrics/zeroshot_classification.py#L50) that performs normalization of the text embedding twice. Specifically, the F.normalize function from PyTorch is called to normalize the text embedding along the last dimension, and the resulting tensor is then averaged along the first dimension to obtain a single embedding vector. However, the code then repeats the normalization step on this single embedding vector using class_embedding.norm(). This second normalization appears to be redundant and can be safely removed.

class_embeddings = model.encode_text(texts)
class_embedding = F.normalize(class_embeddings, dim=-1).mean(dim=0)
class_embedding /= class_embedding.norm() # Repeat Normalization
djghosh13 commented 1 year ago

I believe the second norm is necessary; note that class_embedding is the mean of multiple class_embeddings and isn't guaranteed to be normalized.

mehdidc commented 11 months ago

Indeed, after averaging normalization is not guaranteed anymore, so this is still needed.