NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment

Apache License 2.0

625 stars 78 forks source link

feat: DPO support for global padding of seq_len to a multiple #386

Closed terrykong closed 1 week ago

terrykong commented 2 weeks ago

What does this PR do ?

adds pad_to_multiple_of for DPO which is a requirement for sequence parallel (which is required for moe models w/ TP)
- the argument pad_length_to_multiple_of will pad all minibatches to the same length if >0. If ==0, the behavior is the same as before.

Needed for:

sequence parallel
mamba

Rebase stack

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[ ] Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

[ ] Does the trainer resume and restore model state all states?
[ ] Does the trainer support all parallelism techniques(PP, TP, DP)?
[ ] Does the trainer support max_steps=-1 and validation?
[ ] Does the trainer only call APIs defined in alignable_interface.py?
[ ] Does the trainer have proper logging?

Additional Information

Related to # (issue)