HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
46 stars 6 forks source link

Stabilize MoE #16

Open ClashLuke opened 2 years ago

ClashLuke commented 2 years ago

Currently, our MoE implementation leads to exploding losses and the eventual NaN.\ This issue is about finding the cause behind these problems and fixing it.