Thanks for your sharing, it's very intelligible.
I have q question about experts implement. In paper, expert are implemented by multi layers(MLP), but I just see only one layer here. Do I have misconceptions about this?
Looking forward to your reply, thank you!
Thanks for your sharing, it's very intelligible. I have q question about experts implement. In paper, expert are implemented by multi layers(MLP), but I just see only one layer here. Do I have misconceptions about this? Looking forward to your reply, thank you!