Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
622 stars 31 forks source link

Performance of Declip-88M checkpoint #18

Open Hcyang-NULL opened 2 years ago

Hcyang-NULL commented 2 years ago

Hi, I want to reproduce the zero-shot result of DeClip-88M under ResNet50 in ImageNet-1K (whose performance is 62.5 in the table). But the evaluation result I got is 7.264 which is too low. But the result of ViT-B32 is correct. And I found a problem during loading the ResNet50 checkpoint:

size mismatch for module.logit_scale: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).

I didn't change any code of the model.

Another question is that why run.sh of declip-88m-resnet50 uses clip_solver while other run.sh files use declip_solver? I use declip_solver to do the evaluation for DeClip-88M-ResNet50 by replacing the yaml file. The following figure is the results reproduced on my own compute resources:

image

Do you have any ideas? Thanks!

AadSah commented 1 year ago

Hi @Hcyang-NULL , were you able to figure out the issue? cc: @zlccccc

zlccccc commented 1 year ago
size mismatch for module.logit_scale: copying a param with shape torch.Size([]) from checkpoint, the shape in current model is torch.Size([1]).

This problem is because the saved models come from different torch versions. You can forcibly convert logit_scale to torch.size([]) or torch.size([1]) when loading the model, which will not affect the accuracy.

Hcyang-NULL commented 1 year ago

Thanks for your reply!

I have tried this method before (forcibly reshape the logit_scale). It doesn't work, the performance is still 7.264. But the result of Vit is indeed correct, maybe the checkpoint of resnet50 is inconsistent with the code version? (I guess)

fuchun-wang commented 1 year ago

Excuse me, Could you tell me Where can I find the file named 'val_official.json'? @Hcyang-NULL