Open shamanez opened 2 months ago
+1 to @shamanez request. It would be great to elaborate more on how to integrate "JetMoE" with Megatron both for pretraining and finetuning via Megablocks.
We'd love a community PR to upstream the changes from JetMoE!
@tgale96
The JetMoE technical report has mentioned how they used Megablocks with Megatrone to train the model.
Then the author shared this fork of the megablokcs used during the training.
Could you please let us know how we can proceed with a fine-tuning script?