Bump trl from 0.8.6 to 0.9.4

Bumps trl from 0.8.6 to 0.9.4.

Release notes

v0.9.4

Mainly backward compatibility fixes with SFTTrainer.

What's Changed

Fixed doc string and related docs for the SFTConfig update by @GuilhermeFreire in huggingface/trl#1706

SFTTrainer: Fix backward Compatibility issue with TrainingArguments by @younesbelkada in huggingface/trl#1707

0.9.4 release by @vwxyzjn in huggingface/trl#1708

New Contributors

@GuilhermeFreire made their first contribution in huggingface/trl#1706

Full Changelog: https://github.com/huggingface/trl/compare/v0.9.3...v0.9.4

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:

RLOO Trainer: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by Ahmadian et al from Cohere. Check out our docs here to get started

PPOv2 Trainer: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs here to get started

Reward model visualization: the reward model training now includes visualization on the eval dataset, as shown below.

https://github.com/huggingface/trl/assets/5555347/6575a879-cb2f-4e2e-bb84-a76707f9de84

New losses in the DPO Trainer: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment

New losses in the KTO Trainer: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)

What's Changed

set dev version by @younesbelkada in huggingface/trl#1568

fix add_special_tokens issue for data with template by @edixiong in huggingface/trl#1509

[DPO] add 'bco_pair' loss_type by @seanexp in huggingface/trl#1524

[DPO] DPOConfig class by @kashif in huggingface/trl#1554

[SFT] add SFT Trainer Config dataclass by @kashif in huggingface/trl#1530

FIX: Fix CI on transformers main by @younesbelkada in huggingface/trl#1576

[SFTTrainer] Add warning in SFTTrainer when dataset already processed by @younesbelkada in huggingface/trl#1577

Fix typo detoxifying doc by @qgallouedec in huggingface/trl#1594

Core: removed unexisting SftArgumentParser by @younesbelkada in huggingface/trl#1602

[KTOTrainer] add BCO (reward shift and underlying distribution matching) by @seanexp in huggingface/trl#1599

[CLI] Use auto device map for model load by @lewtun in huggingface/trl#1596

Removing tests/ from package data by @jamesbraza in huggingface/trl#1607

Docs: Fix build main documentation by @younesbelkada in huggingface/trl#1604

support loss function for Self-play Preference Optimization by @winglian in huggingface/trl#1612

Update HH dataset on helpful only subset by @vwxyzjn in huggingface/trl#1613

corrects loss function for Self-play Preference Optimization hard label version by @angelahzyuan in huggingface/trl#1615

... (truncated)

Commits

974b0d3 0.9.4 release (#1708)
39a7d1c SFTTrainer: Fix backward Compatibility issue with TrainingArguments (#1707)
0bdc638 Fixed doc string and docs for the SFTConfig update (#1706)
275d33b 0.9.3 release (#1699)
c0819ee Update sft_trainer.py (#1698)
a03e7cc Release 0.9.2 (#1697)
a13cb89 Quick fix on GPT4-eval (#1696)
84156f1 Fix typo in DPOTrainer's warnings (#1688)
4eb0b90 Skip packing validation (#1673)
6c203f9 Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig (#...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

foundation-model-stack / fms-hf-tuning

Bump trl from 0.8.6 to 0.9.4 #210

v0.9.4

What's Changed

New Contributors

v0.9.3 RLOO / PPOv2 Trainer, RM Visualization

What's Changed