baaivision / EVA

EVA Series: Visual Representation Fantasies from BAAI
MIT License
2.2k stars 162 forks source link

Is there any reason why you didn't use the EVA02 architecture for EVA-CLIP models larger than 4B? #144

Closed stevenliu000 closed 5 months ago

stevenliu000 commented 5 months ago

The EVA02 paper introduces several architectural improvements such as SwiGLU and RoPE. These modifications appear promising in the paper and are also prevalent in modern transformers, such as LLaMA. Despite EVA02_CLIP_E being referred to as EVA02, it lacks these components, as do EVA-CLIP-8B and EVA-CLIP-18B. Is there any specific reason why you chose not to use them?

Quan-Sun commented 5 months ago

@stevenliu000 we have continued to follow the model arch and approach used in EVA-01 when scaling up a smaller model to a larger one, such as EVA-02-E and EVA-8B/18B.