Closed wdlctc closed 2 months ago
What is the export phase?
expert parallelism, sorry for the topo
I see; you can activate expert parallelism here https://github.com/allenai/OLMo/blob/cd0004be3f5a82fff8b4b990a00be1377e084eac/olmo/config.py#L1369 but I think it is not working at the moment as it does not show the params in the param count when I tried. I think you need to do something with device mesh like here https://github.com/mosaicml/llm-foundry/blob/e8eca4fa83f3fec69ad482465f839fb7dcfbfb0d/llmfoundry/models/utils/config_moe_args.py#L68
Anyways regular FSDP w/ fully sharded is enough for most models & is what we used for OLMoE-1B-7B.
I got it; many thanks! Is the default Pretraining script using fsdp? Do you know where I can fine-tune fsdp setting?
Yes it uses fsdp & you can change its settings here https://github.com/allenai/OLMoE/blob/b032a4a4984c3ec3cee21f81f26b70fa5f788a09/configs/OLMoE-1B-7B-0824.yml#L93
Hello OLMoE team,
I’m currently exploring training scripts for models using Mixture of Experts (MOE) and was wondering if there are any existing or planned scripts that handle expert parallelism during the export phase for MOE models? Specifically, I'm interested in techniques for parallelizing the export process for efficient training in distributed environments.
If not, could you provide any guidance on how to implement this or any references that would be useful for such a setup?
Thank you!
Best regards, cheng luo