While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators.
Given several images of a specific subject and a textual prompt, subject-driven generation aims to generate images of the same subject in a context aligning with the prompt.
cd generation
conda env create -f env.yml
Download dreambooth dataset by running this script.
cd subject
bash download_dreambooth.sh
After downloading the data, your directory structure should look like this:
dreambooth
├── dataset
│ ├── backpack
│ └── backpack_dog
│ ...
You can also put your custom images into dreambooth/dataset
.
prompt_idx=0
class_idx=0
./train_dreambooth.sh $prompt_idx $class_idx
where the $prompt_idx
corresponds to different prompts ranging from 0 to 24 and the $class_idx
corresponds to different subjects ranging from 0 to 29.
Launch the training script with accelerate
and pass hyperparameters, as well as LoRa-specific arguments to it such as:
use_hra
: Enables HRA in the training script.hra_r
: the number of HRs (i.e., r) across different layers, expressed in int
.
As r increases, the number of trainable parameters increases, which generally leads to improved performance.
However, this also results in higher memory consumption and longer computation times.
Therefore, r is usually set to 8.
Note, please set r to an even number to avoid potential issues during initialization.hra_apply_GS
: Applys Gram-Schmidt orthogonalization. Default is false
.hra_bias
: specify if the bias
paramteres should be traind. Can be none
, all
or hra_only
.python evaluate.py
python get_result.py
Controllable generation aims to generate images aligning with a textual prompt and additional control signals (such as facial landmark annotations, canny edges, and segmentation maps).
Download ADE20K and CelebA-HQ datasets by running this script.
cd control
bash download_ade20k.sh
bash download_celebhq.sh
For COCO dataset, we follow OFT to download and preprocess it.
After downloading the data, your directory structure should look like this:
data
├── ADE20K
│ ├── train
│ │ ├── color
│ │ ├── segm
│ │ └── prompt_train_blip.json
│ └── val
│ │ ├── color
│ │ ├── segm
│ │ └── prompt_val_blip.json
└── COCO
│ ├── train
│ │ ├── color
│ │ ├── depth
...
Download the pre-trained model weights v1-5-pruned.ckpt and save it in the models
directory.
python tool_add_hra.py \
--input_path=./models/v1-5-pruned.ckpt \
--output_path=./models/hra_r_8.ckpt \
--r=8
python train.py \
--r=8 \
--control=segm
python generation.py
--r=8 \
--control=segm
python eval_landmark.py
python eval_canny.py
Note, for evaluating the segmentation map-to-image (S2I) task, please install the Segformer repository. Run the following testing command on both the original and generated images.
python tools/test.py local_configs/segformer/B4/segformer.b4.512x512.ade.160k.py ./weights/segformer.b4.512x512.ade.160k.pth
We adapt DeBERTaV3-base and test the performance of the adapted models on General Language Understanding Evaluation (GLUE) benchmark.
cd nlu
conda env create -f env.yml
Before fine-tuning, you need to install the dependencies.
python setup.py install
Run this scipt to download glue dataset.
cache_dir=/tmp/DeBERTa/
cd experiments/glue
./download_data.sh $cache_dir/glue_tasks
Run tasks.
./mnli.sh
./cola.sh
./mrpc.sh
./qnli.sh
./qqp.sh
./rte.sh
./sst2.sh
./stsb.sh
We have not yet completed the integration of HRA code into PEFT. Before that, if you want to try using the HRA method to fine-tune large models, you can follow the steps below.
Go to the llama folder
cd llama
We recommend using Python 3.10 for your environment and use the conda to install it.
conda create -n pytorch python=3.10
Then install the required packages with the following command:
pip install -r requirements.txt
Please note that the peft package and transformer package must be downloaded with the versions consistent with those listed in the requirements file.
After completing the download, please replace the oft folder inside the peft/tuners within your running environment's python/site-packages with the oft folder from the current directory.
The path for the oft folder in the environment should be:
/your_path/anaconda3/envs/pytorch/lib/python3.10/site-packages/peft/tuners/
The layer.py in the current oft directory is implemented for when λ is not infinity.
If you want to simulate when λ is infinity, please replace layer.py with layer_GS_HRA.py, and set the hyperparameter λ to 0 during training.
The dataset we use for fine-tuning is MetaMathQA-40K, which can be downloaded through this link.
The model we use for fine-tuning is llama2. You can choose the model you want to fine-tune.
Run the following code to complete the fine-tuning:
bash tune.sh
Please note that you need to change the dataset path, the path of the pre-trained model, and you can change the parameters according to your needs in tune.sh. That is:
BASE_MODEL="YOUR_MODEL_PATH"
DATA_PATH="YOUR_DATA_PATH"
OUTPUT="YOUR_MODEL_SAVED_PATH"
After the training is complete, you can run the following command to test:
bash test.sh
Please note to change the model path in it:
BASE_MODEL="YOUR_MODEL_PATH"
OUTPUT="YOUR_MODEL_SAVED_PATH"
@article{yuan2024bridging,
title={Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation},
author={Yuan, Shen and Liu, Haotian and Xu, Hongteng},
journal={arXiv preprint arXiv:2405.17484},
year={2024}
}