⬆️ Bump peft from 0.6.0 to 0.10.0

Bumps peft from 0.6.0 to 0.10.0.

Release notes

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

Highlights

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0, accelerate>=0.28.0, transformers>4.38.2, trl>0.7.11. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.

Layer replication

First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.

Improving DoRA

Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True to your LoraConfig. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d layers, as well as linear layers quantized with bitsandbytes.

Mixed LoRA adapter batches

If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:
output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`
Here, "adapter1" and "adapter2" should be the same name as your corresponding LoRA adapters and "__base__" is a special name that refers to the base model without any adapter. Find more details in our docs.

Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter -- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.

New LoftQ initialization function

We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.

Using the new replace_lora_weights_loftq function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.

What's Changed

Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.

Bump version to 0.9.1.dev0 by @BenjaminBossan in huggingface/peft#1517

Fix for "leaf Variable that requires grad" Error in In-Place Operation by @DopeorNope-Lee in huggingface/peft#1372

FIX [CI / Docker] Follow up from #1481 by @younesbelkada in huggingface/peft#1487

CI: temporary disable workflow by @younesbelkada in huggingface/peft#1534

FIX [Docs/ bnb / DeepSpeed] Add clarification on bnb + PEFT + DS compatibilities by @younesbelkada in huggingface/peft#1529

Expose bias attribute on tuner layers by @BenjaminBossan in huggingface/peft#1530

docs: highlight difference between num_parameters() and get_nb_trainable_parameters() in PEFT by @kmehant in huggingface/peft#1531

fix: fail when required args not passed when prompt_tuning_init==TEXT by @kmehant in huggingface/peft#1519

Fixed minor grammatical and code bugs by @gremlin97 in huggingface/peft#1542

Optimize levenshtein_distance algorithm in peft_lora_seq2seq_accelera… by @SUNGOD3 in huggingface/peft#1527

Update prompt_based_methods.md by @insist93 in huggingface/peft#1548

... (truncated)

Commits

8221246 Release: v0.10.0 (#1573)
8e979fc More convenient way to initialize LoftQ (#1543)
a86b29a Fix LoftQ docs and tests (#1532)
8dd45b7 FIX [CI] Fix test docker CI (#1535)
91e4b08 FEAT Mixing different LoRA adapters in same batch (#1558)
a18734d Update style with ruff 0.2.2 (#1565)
6008f27 Changes to support fsdp+qlora and dsz3+qlora (#1550)
a9425d1 TST Report slowest tests (#1556)
3b63996 Feat: Support for Conv2D DoRA (#1516)
3eb6bba QDoRA: Support DoRA with BnB quantization (#1518)
Additional commits viewable in compare view

You can trigger a rebase of this PR by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

caikit / caikit-nlp

⬆️ Bump peft from 0.6.0 to 0.10.0 #338

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

Highlights

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

Layer replication

Improving DoRA

Mixed LoRA adapter batches

New LoftQ initialization function

What's Changed