TempleX98 / MoVA

[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Apache License 2.0
129 stars 1 forks source link

Dinov2 seems to fail in some cases #4

Closed liuyifan22 closed 3 months ago

liuyifan22 commented 3 months ago

Hello! Thanks for your marvellous work. I'm learning from your eval code, but have been confronted by some unxepected errors. It seems that Dinov2 is not functioning properly, with the ERROR info:

mova/mova/model/vision_experts/dinov2/modeling_dinov2.py", line 415, in forward
self.norm1(hidden_states), # in Dinov2, layernorm is applied before self-attention xxx xxx RuntimeError: Expected weight to be of same shape as normalized_shape, but got weight of shape [1536] and normalized_shape = [1024]

Here is the config for my downloaded dinov2: image

I guess there maybe some incompatibility between my downloaded Dinov2. Would you please provide your dinov2 model&checkpoint to help me out? Thanks!

TempleX98 commented 3 months ago

I am sorry for the mistake. The config.json file has an incorrect model name of DINOv2. We should be using facebook/dinov2-giant instead of the large model. I’ve fixed the issue now.

liuyifan22 commented 3 months ago

Thanks for your timely reply! After taking on the giant version, the eval runs smoothly.