Bump peft from 0.6.0 to 0.12.0

Bumps peft from 0.6.0 to 0.12.0.

Release notes

v0.12.0: New methods OLoRA, X-LoRA, FourierFT, HRA, and much more

Highlights

New methods

OLoRA

@tokenizer-decode added support for a new LoRA initialization strategy called OLoRA (#1828). With this initialization option, the LoRA weights are initialized to be orthonormal, which promises to improve training convergence. Similar to PiSSA, this can also be applied to models quantized with bitsandbytes. Check out the accompanying OLoRA examples.

X-LoRA

@EricLBuehler added the X-LoRA method to PEFT (#1491). This is a mixture of experts approach that combines the strength of multiple pre-trained LoRA adapters. Documentation has yet to be added but check out the X-LoRA tests for how to use it.

FourierFT

@Phoveran, @zqgao22, @Chaos96, and @DSAILatHKUST added discrete Fourier transform fine-tuning to PEFT (#1838). This method promises to match LoRA in terms of performance while reducing the number of parameters even further. Check out the included FourierFT notebook.

HRA

@DaShenZi721 added support for Householder Reflection Adaptation (#1864). This method bridges the gap between low rank adapters like LoRA on the one hand and orthogonal fine-tuning techniques such as OFT and BOFT on the other. As such, it is interesting for both LLMs and image generation models. Check out the HRA example on how to perform DreamBooth fine-tuning.

Enhancements

IA³ now supports merging of multiple adapters via the add_weighted_adapter method thanks to @alexrs (#1701).

Call peft_model.get_layer_status() and peft_model.get_model_status() to get an overview of the layer/model status of the PEFT model. This can be especially helpful when dealing with multiple adapters or for debugging purposes. More information can be found in the docs (#1743).

DoRA now supports FSDP training, including with bitsandbytes quantization, aka QDoRA ()#1806).

VeRA has been extended by @dkopi to support targeting layers with different weight shapes (#1817).

@kallewoof added the possibility for ephemeral GPU offloading. For now, this is only implemented for loading DoRA models, which can be sped up considerably for big models at the cost of a bit of extra VRAM (#1857).

Experimental: It is now possible to tell PEFT to use your custom LoRA layers through dynamic dispatching. Use this, for instance, to add LoRA layers for thus far unsupported layer types without the need to first create a PR on PEFT (but contributions are still welcome!) (#1875).

Examples

@shirinyamani added a script and a notebook to demonstrate DoRA fine-tuning.

@rahulbshrestha contributed a notebook that shows how to fine-tune a DNA language model with LoRA.

Changes

Casting of the adapter dtype

Important: If the base model is loaded in float16 (fp16) or bfloat16 (bf16), PEFT now autocasts adapter weights to float32 (fp32) instead of using the dtype of the base model (#1706). This requires more memory than previously but stabilizes training, so it's the more sensible default. To prevent this, pass autocast_adapter_dtype=False when calling get_peft_model, PeftModel.from_pretrained, or PeftModel.load_adapter.

Adapter device placement

The logic of device placement when loading multiple adapters on the same model has been changed (#1742). Previously, PEFT would move all adapters to the device of the base model. Now, only the newly loaded/created adapter is moved to the base model's device. This allows users to have more fine-grained control over the adapter devices, e.g. allowing them to offload unused adapters to CPU more easily.

PiSSA

... (truncated)

Commits

e6cd24c Release v0.12.0 (#1946)
05f57e9 PiSSA, OLoRA: Delete initial adapter after conversion instead of the active a...
2ce83e0 FIX Decrease memory overhead of merging (#1944)
ebcd079 [WIP] ENH Add support for Qwen2 (#1906)
ba75bb1 FIX: More VeRA tests, fix tests, more checks (#1900)
6472061 FIX Prefix tuning Grouped-Query Attention (#1901)
e02b938 FIX PiSSA & OLoRA with rank/alpha pattern, rslora (#1930)
5268495 FEAT Add HRA: Householder Reflection Adaptation (#1864)
2aaf9ce ENH Sync LoRA tp_layer methods with vanilla LoRA (#1919)
a019f86 FIX sft script print_trainable_parameters attr lookup (#1928)
Additional commits viewable in compare view

You can trigger a rebase of this PR by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

caikit / caikit-nlp

Bump peft from 0.6.0 to 0.12.0 #378

v0.12.0: New methods OLoRA, X-LoRA, FourierFT, HRA, and much more

Highlights

New methods

OLoRA

X-LoRA

FourierFT

HRA

Enhancements

Examples

Changes

Casting of the adapter dtype

Adapter device placement

PiSSA