NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.62k stars 2.38k forks source link

[QUESTION] deepseek v2 compatility? #1295

Open wavy-jung opened 1 day ago

wavy-jung commented 1 day ago

Assuming there is no first_k_dense_replace, is the current code fully compatible with deepseek-v2? (moe + multi-latent attention)