FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

unable to load local weight #3

Closed liukc19 closed 2 months ago

liukc19 commented 2 months ago

I have manually downloaded the model weights from Hugging Face and tried fine-tuning the model using the following command. bash scripts/vl_finetune.sh ./groma-7b-pretrain ./train_history/ But the program still tries to access the weights from the website. I set the following output for debug. image image And this is the output: image It seems that the program is unable to local model weight?

Looking forward to your reply.

liukc19 commented 2 months ago

Sorry, the issue seems to be occurring here. image image So what is the vis_encoder_path

machuofan commented 2 months ago

Thanks for the feedback. The bug occurs as the program is looking for a local DINOv2-L checkpoint to initialize CustomDDETRModel. This is not an expected behavior. A quick fix is to delete vis_encoder_path: checkpoints/dinov2-large in groma-7b-pretrain/config.json. I will fix the initialization logic soon later.

liukc19 commented 2 months ago

"Thank you for your prompt feedback. I have encountered a new issue. Do you know why this is happening?" image

liukc19 commented 2 months ago

I think this error comes from a failed installation of mmcv. Could you clarify the relationship between the mmcv folder in your repository and the mmcv package?

machuofan commented 2 months ago

We inherited the mmcv folder from GPT4ROI. I think it is originated from mmcv==1.4.7.