databricks / megablocks

Apache License 2.0
1.11k stars 154 forks source link

ScatterMoE feature #104

Open ehartford opened 2 months ago

ehartford commented 2 months ago

I would like to request ScatterMoE feature in Megablocks

https://arxiv.org/abs/2403.08245

https://github.com/shawntan/scattermoe

mvpatel2000 commented 2 months ago

We'd love community PRs for this! Happy to help review and design. It's not currently on our roadmap, but we are evaluating it.

tgale96 commented 2 months ago

Eric, do you know that Scatter MoE is beneficial for your use case or are you interested based on the results from the paper? If the former, it would be very helpful if you could share!

I have some scripts from Shawn and it is on my list to benchmark and see if we could get some wins from their kernels. I am a bit buried though, so I am not sure when I'll get to it 😓

ehartford commented 2 months ago

we know that it's much more efficient training with Scatter MoE and we would like to benefit from the cost savings

tgale96 commented 2 months ago

Thanks, Eric. Can you share more about your use case so that we can include it in our analysis? Scripts to reproduce would be excellent, if possible :)

ehartford commented 2 months ago

this is feature request, not a bug that could be reproduced. The academic paper I am requesting is linked above.