Closed paulgavrikov closed 8 months ago
Hi @paulgavrikov, apologies for the delayed response. We fixed the logit_scale (logit_scale.exp() is consistently set to 100) during the training of EVA-CLIP-8B and EVA-CLIP-18B models. Please ignore the logit_scale in models on Hugging Face as it was affected by a mistake in the weights transformation process, but it didn't affect the usage of EVA-CLIP models.
Hi,
Thank you so much for these very cool models. In your docs, you compute the probability by:
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
.I am wondering about the
100.0 *
multiplier. In the original OpenAI model this is because100 = model.logit_scale.exp()
(see forward-pass of CLIP). However, for your models I get very different measurements.model.logit_scale.exp() == 56.9953
. So the multiplier should be 56.9953, no?model.logit_scale.exp() == inf
, becausemodel.scale == 100
. Is this just incorrectly hardcoded?Obviously, this only changes confidence - not accuracy but could you kindly explain what the correct multiplier is for your models?