Support JetMoE - Githubissues

EwoutH commented 4 months ago

Very interesting new open source model just dropped, called JetMoE. It looks like the current SOTA as far as compute efficiency goes.

It has a very interesting model architecture:

Model Details JetMoE-8B has 24 blocks. Each block has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP Experts (MoE). Each MoA and MoE layer has 8 expert, and 2 experts are activated for each input token. It has 8 billion parameters in total and 2.2B active parameters. JetMoE-8B is trained on 1.25T tokens from publicly available datasets, with a learning rate of 5.0 x 10-4 and a global batch-size of 4M tokens.

Website: https://research.myshell.ai/jetmoe
Github: https://github.com/myshell-ai/JetMoE
HuggingFace: https://huggingface.co/jetmoe/jetmoe-8b
Chat Demo on Lepton AI: https://www.lepton.ai/playground/chat?model=jetmoe-8b-chat

Model	Active Params	Training Tokens	MBPP	Open LLM Leaderboard Average	ARC	Hellaswag	MMLU	TruthfulQA	WinoGrande	GSM 8K
Gemma-2B	2B	2T	28.0	46.4	48.4	71.8	41.8	33.1	66.3	16.9
DeepseekMoE-16B	2.8B	2T	34.0	51.1	53.2	79.8	46.3	36.1	73.7	17.3
LLaMA2-7B	7B	2T	20.8	51.0	53.1	78.6	46.9	38.8	74.0	14.5
LLaMA-13B	13B	1T	22.0	51.4	56.2	80.9	47.7	39.5	76.2	7.6
JetMoE-8B	2.2B	1.25T	34.2	53.0	48.7	80.5	49.2	41.7	70.2	27.8

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

EwoutH commented 3 months ago

Could this issue be reopened?

sorasoras commented 2 months ago

@ggerganov Have you look into this arch? A small Moe is interesting.

sorasoras commented 1 month ago

Can this request be reopen?

ggerganov / llama.cpp

Support JetMoE #6499