ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64k stars 9.17k forks source link

Support JetMoE #6499

Open EwoutH opened 4 months ago

EwoutH commented 4 months ago

Very interesting new open source model just dropped, called JetMoE. It looks like the current SOTA as far as compute efficiency goes.

It has a very interesting model architecture:

Model Details JetMoE-8B has 24 blocks. Each block has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP Experts (MoE). Each MoA and MoE layer has 8 expert, and 2 experts are activated for each input token. It has 8 billion parameters in total and 2.2B active parameters. JetMoE-8B is trained on 1.25T tokens from publicly available datasets, with a learning rate of 5.0 x 10-4 and a global batch-size of 4M tokens.

image

Model Active Params Training Tokens MBPP Open LLM Leaderboard Average ARC Hellaswag MMLU TruthfulQA WinoGrande GSM 8K
Gemma-2B 2B 2T 28.0 46.4 48.4 71.8 41.8 33.1 66.3 16.9
DeepseekMoE-16B 2.8B 2T 34.0 51.1 53.2 79.8 46.3 36.1 73.7 17.3
LLaMA2-7B 7B 2T 20.8 51.0 53.1 78.6 46.9 38.8 74.0 14.5
LLaMA-13B 13B 1T 22.0 51.4 56.2 80.9 47.7 39.5 76.2 7.6
JetMoE-8B 2.2B 1.25T 34.2 53.0 48.7 80.5 49.2 41.7 70.2 27.8
github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

EwoutH commented 3 months ago

Could this issue be reopened?

sorasoras commented 2 months ago

@ggerganov Have you look into this arch? A small Moe is interesting.

sorasoras commented 1 month ago

Can this request be reopen?