Open shreyajain4 opened 2 years ago
@FreddeFrallan please have a look
Hello, I looked into this issue too.
I think the issue is related to CLIP embedding size. Where 512 in ViT, 640 in ResNet. Since M-BERT-Base-69-ViT use CLIP ViT, the 512 seems right.
However I think out_features should be included in configuration for prevent misunderstanding.
I tried the following piece of code present in the repo at location https://github.com/FreddeFrallan/Multilingual-CLIP/blob/main/src/multilingual_clip.py
The only changes I made is that I added print statements in between.
` import pickle
import torch import transformers
AVAILABLE_MODELS = { 'M-BERT-Distil-40': { 'model_name': 'M-CLIP/M-BERT-Distil-40', 'tokenizer_name': 'M-CLIP/M-BERT-Distil-40', 'head_name': 'M-BERT Distil 40 Linear Weights.pkl' },
}
class MultilingualClip2(torch.nn.Module): def init(self, model_name, tokenizer_name, head_name, weights_dir='data/weights/'): super().init() self.model_name = model_name self.tokenizer_name = tokenizer_name self.head_path = weights_dir + head_name
def load_model2(name): config = AVAILABLE_MODELS[name] return MultilingualClip2(**config)
mod = load_model2('M-BERT-Base-ViT-B') z = mod(Query[0]) `
Output for this code : ok torch.Size([512, 768]) torch.Size([512]) embs_text torch.Size([1, 6, 768]) att_text torch.Size([1, 6]) embs_text torch.Size([1, 768]) clip head obj Linear(in_features=768, out_features=640, bias=True) cliphed_text torch.Size([1, 512])
This output suggest that the file 'M-BERT-Base-69-ViT Linear Weights.pkl' doesn't have the size of 640 X 768 but 512 X 768
Is there any issue with the config then ?