Like WideNet proposed, we could combine a MoE-architecture with weight sharing. Incorporating a WideNet-style architecture should increase performance, decrease training time, and reduce the number of parameters needed.
This issue is about implementing such a weight-sharing protocol and benchmarking its performance.
Like WideNet proposed, we could combine a MoE-architecture with weight sharing. Incorporating a WideNet-style architecture should increase performance, decrease training time, and reduce the number of parameters needed. This issue is about implementing such a weight-sharing protocol and benchmarking its performance.