SHI-Labs / CuMo

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Apache License 2.0
132 stars 11 forks source link

support Llama-3 models #4

Closed chricro closed 4 months ago

chricro commented 4 months ago

Hi,

Thank your for your work. Do you plan to release CuMo-Llama-3 models (8b / 70b) instead of Mistral ? It could improve the performance even further, what do you think ?

chrisjuniorli commented 4 months ago

Unfortunately, we were unable to use LLaMA models for CuMo v1 due to licensing constraints with our ByteDance collaborators. However, we encourage the open-source community to explore CuMo with these models from Meta as we’ve open-sourced all related data and training code. We may also explore it in the future from the academia side.