NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.66k stars 2.44k forks source link

Use Mcore's Modelopt Spec When Exporting #9487

Closed suiyoubi closed 1 month ago

suiyoubi commented 3 months ago

What does this PR do ?

Unified to use mcore's modelopt specs instead of the NeMo one

janekl commented 3 months ago

Thanks, approved. Just please make sure that CI pipeline passes

github-actions[bot] commented 2 months ago

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

janekl commented 2 months ago

Hi @suiyoubi, do you think we could get this in? It just needs conflict resolution

janekl commented 2 months ago

Hmm I think that @akoumpa extended https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/models/language_modeling/megatron/gpt_layer_modelopt_spec.py to cover MoE.

So we can't really replace the spec unless we also enable MoE layer spec in https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/inference/gpt/model_specs.py

github-actions[bot] commented 1 month ago

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions[bot] commented 1 month ago

This PR was closed because it has been inactive for 7 days since being marked as stale.