Question about caption evaluation

ch3cook-fdu / Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods

MIT License

76 stars 5 forks source link

Question about caption evaluation #5

Closed 8reaks closed 9 months ago

8reaks commented 9 months ago

Hi,

Where could I find the pretrained weights of 4.1 or 4.2, I have tested the weights provided on huggingface by 4.3. However, they all returned errors as the image shown.

ch3cook-fdu commented 9 months ago

The error in the image indicates the weights are not properly loaded. This might be caused by different implementation of GPT2Attention in different versions of huggingface transformers. In my case (transformers==4.30.2), the weights for attention bias are always ignored. If you still encounter certain error after installing transformers==4.30.2, please try editing this line into:

model.load_state_dict(checkpoint["model"], strict=False)

to skip the loading of attention bias.

All the weights for Vote2Cap-DETR could be found under this folder.

8reaks commented 9 months ago

Thank you. It works! But for weights of Vote2Cap-DETR++, it still has an error of

"RuntimeError: Error(s) in loading state_dict for CaptionNet: size mismatch for captioner.transformer.transformer.wpe.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([64, 256])."

Moreover, for the caption case, when I look into the evaluation results of Vote2Cap-DETR and visualize it, it seems have much fewer boundingbox with caption than the ground truth. Is the result correct?

ch3cook-fdu commented 9 months ago

Sorry that I have not currently released these codes for the Vote2Cap-DETR++ model, thus the weights currently cannot be properly loaded.

As for the visualization, this is not usual. Please consider using the following commands:

python demo.py --use_color --use_normal --dataset scene_scanrefer --vocabulary scanrefer --use_beam_search --detector detector_Vote2Cap_DETR --captioner captioner_dcc --test_ckpt path_to_your_weight.pth

This will generate some .json files that can be used to visualize the predictions. You can use the tools in this repo to help visualize the box estimations.

8reaks commented 9 months ago

Thank you so much!