JL-er / RWKV-PEFT

84 stars 22 forks source link

RWKV-PEFTRWKV-PEFT

[ English | 中文 ]

RWKV-PEFT is the official implementation for efficient parameter fine-tuning of RWKV5/6 models, supporting various advanced fine-tuning methods across multiple hardware platforms.

Installation

[!IMPORTANT] Installation is mandatory.

git clone https://github.com/JL-er/RWKV-PEFT.git
cd RWKV-PEFT
pip install -r requirements.txt

Web Run

[!TIP] If you are using a cloud server (such as Vast or AutoDL), you can start the Streamlit service by referring to the help documentation on the cloud server's official website.

streamlit run web/app.py

Table of Contents

Hardware Requirements

The following shows memory usage when using an RTX 4090 (24GB VRAM) + 64GB RAM (with parameters: --strategy deepspeed_stage_1 --ctx_len 1024 --micro_bsz 1 --lora_r 64):

Model Size Full Finetuning LoRA/PISSA QLoRA/QPISSA State Tuning
RWKV6-1.6B OOM 7.4GB 5.6GB 6.4GB
RWKV6-3B OOM 12.1GB 8.2GB 9.4GB
RWKV6-7B OOM 23.7GB* 14.9GB** 18.1GB

Note:

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Run example script:

    sh scripts/run_lora.sh

    Note: Please refer to the RWKV official tutorial for detailed data preparation

  3. Start with web GUI:

    [!TIP] If you're using cloud services (such as Vast or AutoDL), you'll need to enable web port access according to your service provider's instructions.

streamlit run web/app.py

Main Features

Detailed Configuration

1. PEFT Method Selection

--peft bone --bone_config $lora_config

2. Training Parts Selection

--train_parts ["time", "ln"]

3. Quantized Training

--quant int8/nf4

4. Infinite Length Training (infctx)

--train_type infctx --chunk_ctx 512 --ctx_len 2048

5. Data Loading Strategy

--dataload pad

6. DeepSpeed Strategy

--strategy deepspeed_stage_1

Available strategies:

7. FLA Operator

By default, RWKV-PEFT uses custom CUDA kernels for wkv computation. However, you can use --fla to enable the Triton kernel:

--fla

GPU Support

Citation

If you find this project helpful, please cite our work:

@misc{kang2024boneblockaffinetransformation,
      title={Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models},
      author={Jiale Kang},
      year={2024},
      eprint={2409.15371},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.15371}
}