Questions about EVA-CLIP-G used in SEED.

AILab-CVC / SEED

Official implementation of SEED-LLaMA (ICLR 2024).

https://ailab-cvc.github.io/seed

Other

576 stars 31 forks source link

Closed Haochen-Wang409 closed 8 months ago

Haochen-Wang409 commented 10 months ago

Impressing work!

I noticed that SEED utilized a visual encoder pre-trained by EVA-CLIP-G. The original EVA-CLIP-G has 40 blocks but SEED omitts the last block (https://github.com/AILab-CVC/SEED/blob/main/models/seed_qformer/eva_vit.py#L467). Is there any special consideration?

Haochen-Wang409 commented 10 months ago

sijeh commented 9 months ago

Sorry for the late reply, following the settings of blip2, we use the penultimate layer feature of eva-clip-g, so there are only 39 layers