Open andrewnc opened 3 months ago
Hi, yes! The Megatron-LM binding is just what I used for experiments. You should be able to use the dMoE layer from other codebases relatively easily. Is there a particular codebase you had in mind for integrating into?
Here are a couple other repos that integrate it. These both actually do something a bit more complicated than necessary because they wanted to add features specific to their frameworks.
I have an internal code base with a pretty vanilla decoder only transformer. I am hoping to swap that out with a dmoe. Thank you for these pointers - it seems like a simpler version of the nanotron example is what I'll try to implement!
Awesome! You should be able to just use the dMoE
class directly, like you would any other layer. Let me know if you run into any issues!
Is it possible to import the dmoe model itself into another training script without training via megatron?