databricks / megablocks

Apache License 2.0
1.11k stars 154 forks source link

Import dmoe model into other training script? #101

Open andrewnc opened 3 months ago

andrewnc commented 3 months ago

Is it possible to import the dmoe model itself into another training script without training via megatron?

tgale96 commented 2 months ago

Hi, yes! The Megatron-LM binding is just what I used for experiments. You should be able to use the dMoE layer from other codebases relatively easily. Is there a particular codebase you had in mind for integrating into?

Here are a couple other repos that integrate it. These both actually do something a bit more complicated than necessary because they wanted to add features specific to their frameworks.

andrewnc commented 2 months ago

I have an internal code base with a pretty vanilla decoder only transformer. I am hoping to swap that out with a dmoe. Thank you for these pointers - it seems like a simpler version of the nanotron example is what I'll try to implement!

tgale96 commented 2 months ago

Awesome! You should be able to just use the dMoE class directly, like you would any other layer. Let me know if you run into any issues!