YeonwooSung / Pytorch_mixture-of-experts

PyTorch implementation of moe, which stands for mixture of experts
32 stars 4 forks source link

This MoE is not useful. #1

Closed jiaxiangc closed 10 months ago

jiaxiangc commented 1 year ago

I try to change number of experts, but i find it dose not work well no matter what number experts i set.

For example, when n=10, the acc is 46% after 100 epochs. when n=3, the acc is 47% after 100 epochs. when n=1, the acc is 49% after 100 epochs. So I want ask if the code is wrong?

YeonwooSung commented 1 year ago

Which dataset did you used for training? If you used the test.py directly, then it is using just a dummy data with random integers (aim of that file is just to illustrate how to use the MoE layer).

Did you try changing loss functions or combining with other layers (such as linear layer, Conv layers, LSTM, etc), etc?

I usually use this layer with other nn modules, for example, using MoE as a head model of the huggingface transformers, or implementing a custom self-attention layer with MoE (replacing output linear layer with MoE).

YeonwooSung commented 10 months ago

Inactivate this issue, due to inactivity. Please feel free to reopen this issue if you have any other issue.