-
megatron_moe的是对route_logits实现是先topk,再softmax;贵团队的modeling_qwen2_moe.py中的route_logits是先softmax,再topk,然后有个参数norm_topk_prob控制是否再进行归一化。
在qwen2-moe中的norm_topk_prob是false,会导致megatron转化来的router_logits量级不对(m…
-
Because I am using vLLM server to deploy a MoE model. However, this model has a large number of experts and the number of activated experts is very small. So it is very suitable for the expert offload…
-
Dear professors and experts, I have a question that I would like to consult, because I found that the memory required for single-thread sbas processing is too large to meet when I was conducting SBAS-…
-
corriger le titre de l'exo1
-
Assignment 3
ELEM DAVID OBIAHU
2022/HND/35291/CS
Question:. How does expert system resolve rule base conflict
Answer
Expert system resolve rule base conflict through various ways which includes
A.…
-
Hi there, thanks mergoo, an amazing code base for MoE model construction.
A crucial feature that may need to be implemented is that mergoo should let the user select the basic routing policy when c…
-
Can we implement the expert parallel strategy for MoE to fully exploit the sparse activation property? Ideally, MoE should only use compute at the order of active parameters, but the current implement…
-
# _Assignees_
###### Since our private repository paid plan is free, we cannot use the full GitHub team functionality (such as adding multiple Assignees to an issue). After evaluation, we have decide…
-
```
assert not args.model_parallel.fp16, \
"Expert parallelism is not supported with fp16 training."
```
from https://github.com/NVIDIA/Megatron-LM/blob/db3a3f79d1cda60ea4b3db0ceffcf…
-
The expert needs a way to assign keywords to the papers he is responsible for.
## Expected Behavior
Under the assumption that the papers are in his basket we need a special editor where the expert…