FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

model weight problem #6

Closed liukc19 closed 2 months ago

liukc19 commented 2 months ago

I tried to finetune groma on REC dataset only(Refcoco/+/g), but get bad result on refcoco_val (with iou@0.5 accu and m_iou about 0.5). I also tried to evaluate groma on refcoco_val with groma-7b-pretrain weight and get the following result.

截屏2024-05-09 17 28 32

Is this result normal?

liukc19 commented 2 months ago

env:

liukc19 commented 2 months ago

I tried to export the pred_boxes and drew them in the pictures.

截屏2024-05-10 09 46 50 截屏2024-05-10 09 44 28

The red bouding boxes are ground-truth boxes, and the blue boxes correspond to the prediction results and the model proposals respectively. It seemed that the poor prediction results was caused by bad proposals.

As for why the loss decreased during the training process, I believe it was because the ground truth bounding boxes are injected.

截屏2024-05-10 09 51 20

Thanks for your help in advance.

liukc19 commented 2 months ago

I also tried the weight of groma-7b-finetune, but got the same result. Is it possible that these errors come from this commit? #3 Maybe the real weight is in this path "vis_encoder_path": "checkpoints/dinov2-large" ?

machuofan commented 2 months ago

Hi there, sorry for the late reply. I agree that the problem probably originates from model initialization. Could you please have a try by downloading the DINOv2 checkpoint, and changing line 104-107 in groma/model/ddetr.py from

if pretrained_vis_encoder is not None:
    self.vis_encoder = Dinov2Model.from_pretrained(pretrained_vis_encoder)
else:
    self.vis_encoder = Dinov2Model(config.vis_encoder_cfg)

to

self.vis_encoder = Dinov2Model.from_pretrained({path_to_dinov2_ckpt})

, which forces the model to load DINOv2 pretrain for initialization.

liukc19 commented 2 months ago

thank u for ur suggestions, i'll try it later

liukc19 commented 2 months ago

I tried the method u suggested but got the same result image can you reproduce the result in your local environment(with finetuned model weight)?

machuofan commented 2 months ago

I found this error was caused by a mis-config of hyper parameter nms_thres, which was set to 0.0 but should be 0.6. It is now fixed in the latest commit. Please feel free to have a try.