Closed lln556 closed 1 year ago
There is a comment on line 41 of the document model_medclip.py, after i uncomment this line, the program is running normally. Am I doing the right thing?Sincerely waiting your answer.
Thanks for noticing! You can fix it by uncommenting it. You can use MedCLIPVisionModelViT normally. The problem comes from the different pretraining strategies of ViT and ResNet-based models. I will fix the ResNet version later.
It seems like that the logits from resent50 pre-trained weights cannot work well. I get an extremely low accuracy for prompt classification when I use resnet50. However, it works pretty well for Vit backbone.
Thanks for your sharing
Traceback (most recent call last): File "D:/Projects/MedCLIP/11.py", line 20, in
outputs = model(*inputs)
File "D:\Programming environment\Python--3.8.9\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(input, *kwargs)
File "D:\Projects\MedCLIP\medclip\modeling_medclip.py", line 216, in forward
logits_per_image = self.compute_logits(img_embeds, text_embeds)
File "D:\Projects\MedCLIP\medclip\modeling_medclip.py", line 230, in compute_logits
logits_per_text = torch.matmul(text_emb, img_emb.t()) logit_scale
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x768 and 512x1)
I haven't used hugging face before, whether there are problems in the model of bert or other problems?
Waiting for your reply Thanks Sincerely