I am so excited to see such a great open-sourced decoder-only moe model, which helps me a lot in my own research.
One thing I am curious about is whether you will release larger model (e.g. over 100B parameters)? I believe such a model will help promote a lot of related works.
Thanks a lot for the kind words! This is exactly what we want to see when doing this project!
But for the larger model, unfortunately, the answer is no. We do not have that much computation resource. :(
I am so excited to see such a great open-sourced decoder-only moe model, which helps me a lot in my own research.
One thing I am curious about is whether you will release larger model (e.g. over 100B parameters)? I believe such a model will help promote a lot of related works.