Closed JadeRay closed 6 months ago
@JadeRay : The benchmarking methodology is explained here: https://github.com/databricks/dbrx/issues/9#issuecomment-2025688421 Let us continue the discussion there.
Closing this in favor of https://github.com/databricks/dbrx/issues/9
I have a problem about the inference data posted in this blog: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
A MoE model with 36B activated parameters and 132B total parameters, it's inference performance will act like a 90B dense model with 2000 prompt and 256 output tokenes. How can it always performs better than llama2-70B dense model? As the batchsize increases, it will perform better than llama2-70B dense model first, and will perform worse than llama2-70B dense model from batchszie 3 or 4, because it will load all the 132B parameters when more and more experts are activated.