How about the performance difference between token-gate and sentence gate?

SimiaoZuo / MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Apache License 2.0

97 stars 13 forks source link

Open GeneZC opened 2 years ago

GeneZC commented 2 years ago

How about the performance difference between token-gate and sentence gate? And how about the value of alpha for load balance loss?