Ucas-HaoranWei / Vary

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
1.77k stars 156 forks source link

如果我微调了clip图像编码器, 加载模型的时候 #69

Open samaritan1998 opened 8 months ago

samaritan1998 commented 8 months ago

是不是需要这样修改呢? self.vision_tower = CLIPVisionModel.from_pretrained('/cache/vit-large-patch14/') 改为 config = CLIPConfig.from_pretrained("/disk/Vary/clip-vit-large-patch14/") self.vision_tower = CLIPVisionModel(config=config.vision_config)