Open ehartford opened 2 months ago
We'd love community PRs for this! Happy to help review and design. It's not currently on our roadmap, but we are evaluating it.
Eric, do you know that Scatter MoE is beneficial for your use case or are you interested based on the results from the paper? If the former, it would be very helpful if you could share!
I have some scripts from Shawn and it is on my list to benchmark and see if we could get some wins from their kernels. I am a bit buried though, so I am not sure when I'll get to it 😓
we know that it's much more efficient training with Scatter MoE and we would like to benefit from the cost savings
Thanks, Eric. Can you share more about your use case so that we can include it in our analysis? Scripts to reproduce would be excellent, if possible :)
this is feature request, not a bug that could be reproduced. The academic paper I am requesting is linked above.
I would like to request ScatterMoE feature in Megablocks
https://arxiv.org/abs/2403.08245
https://github.com/shawntan/scattermoe