Open wavy-jung opened 1 day ago
Assuming there is no first_k_dense_replace, is the current code fully compatible with deepseek-v2? (moe + multi-latent attention)
deepseek-v2
Assuming there is no first_k_dense_replace, is the current code fully compatible with
deepseek-v2
? (moe + multi-latent attention)