databricks / megablocks

Apache License 2.0
1.11k stars 154 forks source link

Add a fine-tune script for JetMoE #105

Open shamanez opened 2 months ago

shamanez commented 2 months ago

@tgale96

The JetMoE technical report has mentioned how they used Megablocks with Megatrone to train the model.

Then the author shared this fork of the megablokcs used during the training.

Could you please let us know how we can proceed with a fine-tuning script?

Malikeh97 commented 2 months ago

+1 to @shamanez request. It would be great to elaborate more on how to integrate "JetMoE" with Megatron both for pretraining and finetuning via Megablocks.

mvpatel2000 commented 2 months ago

We'd love a community PR to upstream the changes from JetMoE!