请问您的MoE对输入进行稀疏了么，我看您代码好像是数据都输入到了每个专家，在输出的时候，选择的top1的专家结果，是这样么？

HyunWookL / TESTAM

Official Code of TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts

MIT License

18 stars 1 forks source link

请问您的MoE对输入进行稀疏了么，我看您代码好像是数据都输入到了每个专家，在输出的时候，选择的top1的专家结果，是这样么？ #4

Open lilailai688 opened 1 month ago

HyunWookL commented 1 month ago

谢谢您的提问！我会用翻译器来回答这个问题。是的，您说得没错。每位专家都将获得相同的输入，门控网络经过训练后将选出排名第一的专家。

顺便提一下，我在下面留下了原始答案（英文）：

Thank you for your question! I'll use translator for this answer. Yes, you are right. Each expert will get the same input and gating network will be trained to select top-1 expert.