Open terrykong opened 2 weeks ago
# Add a code snippet demonstrating how to use this
Pre checks:
max_steps=-1
validation
The DPO dataset changes should stand on their own, but are needed to test the mcore opt changes for moe. If moe issues take too long to resolve, I'll break this up.
What does this PR do ?
Rebase stack
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
Checklist when contributing a new algorithm
max_steps=-1
andvalidation
?Additional Information