BLIP-2 model used for captioning

AndreAmaduzzi commented 10 months ago

Hello, thanks for your interesting work. In Section 4.2 of your research paper, you are mentioning captions generated by Cap3D in its captioning setup or by Cap3D in its VQA setup. I am wondering which BLIP-2 model you used to obtain such captions in the captioning setup (no VQA). Have you used the finetuned BLIP-2 for captioning _"caption_cocoflant5xl" or the original xxl model _"pretrainflant5xxl" without input prompt?

By looking at this file, it looks like you used the original model pretrain_flant5xxl for both the setups...

Thanks in advance, Andrea

crockwell commented 10 months ago

Hi Andrea,

You are correct -- we use pretrain_flant5xxl in both setups.

Best, Chris

AndreAmaduzzi commented 10 months ago

Ok! Thank you very much.

Best, Andrea

crockwell / Cap3D

BLIP-2 model used for captioning #17