NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.62k stars 2.38k forks source link

Fix: Resolve multimodal model errors and update README usage instructions #1286

Open singleheart opened 1 week ago

singleheart commented 1 week ago

This Pull Request addresses the following changes:

  1. Bug Fix: Correct 'sample.answers' access

    • Updated cur_answer = sample.answers to cur_answer = sample.answers['value'] to fix a data access issue.
  2. Bug Fix: Apply 'mistral_custom_template' to llama

    • Set custom_chat_template=None to custom_chat_template=mistral_custom_template to resolve template-related errors in the llama3 model.
  3. Docs Update: Improve usage instructions in README.md

    • Updated usage examples to include a missing option in the megatron-energon section.

These changes aim to enhance model stability and usability while improving documentation for developers. Please review and provide feedback. Thank you!