aimmemotion / EmoVIT

[CVPR 2024] EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
17 stars 1 forks source link

Which versions were used for Flamingo, LLaVA, BLIP-2 and Instruct BLIP? #5

Open ggcr opened 3 months ago

ggcr commented 3 months ago

There is no explicit mention in the paper for which of these models were used.

For example, BLIP-2 is available in different architectures and sizes:

Model ViT LLM Total Params
BLIP-2 ViT-L OPT2.7B ViT-L OPT2.7B 3.1B
BLIP-2 ViT-g OPT2.7B ViT-g OPT2.7B 3.8B
BLIP-2 ViT-g OPT6.7B ViT-g OPT6.7B 7.8B
BLIP-2 ViT-L FlanT5XL ViT-L FlanT5XL 3.4B
BLIP-2 ViT-g FlanT5XL ViT-g FlanT5XL 4.1B
BLIP-2 ViT-g FlanT5XXL ViT-g FlanT5XXL 12.1B

Same with Flamingo which is available in 3B, 9B and 80B. And LLaVA...

Can we know which versions were used for Evaluation?

Thanks in advance.