Closed trias702 closed 1 week ago
Adds wandb logging of avg generation length for SPIN as well as a table showing sample generated responses for their corresponding prompts.
Pre checks:
max_steps=-1
validation
What does this PR do ?
Adds wandb logging of avg generation length for SPIN as well as a table showing sample generated responses for their corresponding prompts.
Before your PR is "Ready for review"
Pre checks:
Checklist when contributing a new algorithm
max_steps=-1
andvalidation
?Additional Information