The EVA02 paper introduces several architectural improvements such as SwiGLU and RoPE. These modifications appear promising in the paper and are also prevalent in modern transformers, such as LLaMA. Despite EVA02_CLIP_E being referred to as EVA02, it lacks these components, as do EVA-CLIP-8B and EVA-CLIP-18B. Is there any specific reason why you chose not to use them?
@stevenliu000 we have continued to follow the model arch and approach used in EVA-01 when scaling up a smaller model to a larger one, such as EVA-02-E and EVA-8B/18B.
The EVA02 paper introduces several architectural improvements such as SwiGLU and RoPE. These modifications appear promising in the paper and are also prevalent in modern transformers, such as LLaMA. Despite EVA02_CLIP_E being referred to as EVA02, it lacks these components, as do EVA-CLIP-8B and EVA-CLIP-18B. Is there any specific reason why you chose not to use them?