Closed jiaxiangc closed 10 months ago
Which dataset did you used for training? If you used the test.py directly, then it is using just a dummy data with random integers (aim of that file is just to illustrate how to use the MoE layer).
Did you try changing loss functions or combining with other layers (such as linear layer, Conv layers, LSTM, etc), etc?
I usually use this layer with other nn modules, for example, using MoE as a head model of the huggingface transformers, or implementing a custom self-attention layer with MoE (replacing output linear layer with MoE).
Inactivate this issue, due to inactivity. Please feel free to reopen this issue if you have any other issue.
I try to change number of experts, but i find it dose not work well no matter what number experts i set.
For example, when n=10, the acc is 46% after 100 epochs. when n=3, the acc is 47% after 100 epochs. when n=1, the acc is 49% after 100 epochs. So I want ask if the code is wrong?