Real Performance versus llama-70B？

databricks / dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Other

2.5k stars 236 forks source link

I have a problem about the inference data posted in this blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

A MoE model with 36B activated parameters and 132B total parameters, it's inference performance will act like a 90B dense model with 2000 prompt and 256 output tokenes. How can it always performs better than llama2-70B dense model? As the batchsize increases, it will perform better than llama2-70B dense model first, and will perform worse than llama2-70B dense model from batchszie 3 or 4, because it will load all the 132B parameters when more and more experts are activated.

databricks / dbrx

Real Performance versus llama-70B？ #10