crockwell / Cap3D

[NeurIPS 2023] Scalable 3D Captioning with Pretrained Models
https://huggingface.co/datasets/tiange/Cap3D
218 stars 13 forks source link

BLIP-2 model used for captioning #17

Closed AndreAmaduzzi closed 10 months ago

AndreAmaduzzi commented 10 months ago

Hello, thanks for your interesting work. In Section 4.2 of your research paper, you are mentioning captions generated by Cap3D in its captioning setup or by Cap3D in its VQA setup. I am wondering which BLIP-2 model you used to obtain such captions in the captioning setup (no VQA). Have you used the finetuned BLIP-2 for captioning _"caption_cocoflant5xl" or the original xxl model _"pretrainflant5xxl" without input prompt?

By looking at this file, it looks like you used the original model pretrain_flant5xxl for both the setups...

Thanks in advance, Andrea

crockwell commented 10 months ago

Hi Andrea,

You are correct -- we use pretrain_flant5xxl in both setups.

Best, Chris

AndreAmaduzzi commented 10 months ago

Ok! Thank you very much.

Best, Andrea