Open Harzva opened 1 year ago
Describe the question(问题描述) Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
In the MOE method does expert have to learn and can the frozen model be used as an expert?like gpt3 bert
thank you very much!!
We just reproduced this model with paddlepaddle according to the source code of the paper, so it can't use other frozen model to be an expert directly, but it supports warm start by the model saved in past epochs.
Describe the question(问题描述) Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
In the MOE method does expert have to learn and can the frozen model be used as an expert?like gpt3 bert
thank you very much!!