Open Muennighoff opened 2 months ago
@Muennighoff Hm... this looks like an error in triton... what version are you running on? It could be an issue on their end
I tried with 2.1.0 & 2.2.0 & 2.3.0 and get it everywhere
I tried with 2.1.0 & 2.2.0 & 2.3.0 and get it everywhere
Hm... would you mind providing a minimal repro? it seems to work fine on my end so wondering if its a setup thing
I am getting the below error upon the first step of multinode training with dMoE. Meanwhile, multinode MoE training & single node dMoE works fine. Any ideas what the problem might be? Thanks!