RWKV-PEFT

[ English | 中文 ]

RWKV-PEFT is the official implementation for efficient parameter fine-tuning of RWKV5/6 models, supporting various advanced fine-tuning methods across multiple hardware platforms.

Installation

[!IMPORTANT] Installation is mandatory.

git clone https://github.com/JL-er/RWKV-PEFT.git
cd RWKV-PEFT
pip install -r requirements.txt

Web Run

[!TIP] If you are using a cloud server (such as Vast or AutoDL), you can start the Streamlit service by referring to the help documentation on the cloud server's official website.

streamlit run web/app.py

Hardware Requirements
Quick Start
Main Features
Detailed Configuration
GPU Support
Citation

Hardware Requirements

The following shows memory usage when using an RTX 4090 (24GB VRAM) + 64GB RAM (with parameters: --strategy deepspeed_stage_1 --ctx_len 1024 --micro_bsz 1 --lora_r 64):

Model Size	Full Finetuning	LoRA/PISSA	QLoRA/QPISSA	State Tuning
RWKV6-1.6B	OOM	7.4GB	5.6GB	6.4GB
RWKV6-3B	OOM	12.1GB	8.2GB	9.4GB
RWKV6-7B	OOM	23.7GB*	14.9GB**	18.1GB

Note:

OOM when batch size is 8 ** Requires 19.5GB VRAM when batch size is 8

Quick Start

Install dependencies:
```
pip install -r requirements.txt
```
Run example script:
```
sh scripts/run_lora.sh
```
Note: Please refer to the RWKV official tutorial for detailed data preparation
Start with web GUI:

[!TIP] If you're using cloud services (such as Vast or AutoDL), you'll need to enable web port access according to your service provider's instructions.

streamlit run web/app.py

Main Features

Multiple Fine-tuning Methods: Supports LoRA, PISSA, Bone, State Tuning, etc.
Quantized Training: Supports INT8/NF4 quantization for significant VRAM reduction
Flexible Data Loading: Supports various data sampling strategies
Memory Optimization: Multiple DeepSpeed strategies available
Loss Masking: Supports loss masking for QA dialogue and padding
Infinite Context Training: Supports infctx training mode, utilizing RWKV's constant memory usage advantage to train with "infinite" context under limited resources
Multi-Hardware Support: RWKV-PEFT officially supports NVIDIA, AMD, Moore Threads, Musa, Iluvatar CoreX, and other hardware platforms. Ascend NPU implementation will be available later. Note: Currently we only support issues for NVIDIA hardware
RWKV-FLA Efficient Training: rwkv-fla is a Triton-based linear attention operator that can run efficiently on hardware without CUDA support

Detailed Configuration

1. PEFT Method Selection

--peft bone --bone_config $lora_config

2. Training Parts Selection

--train_parts ["time", "ln"]

Available parts: emb, head, time, ln
Default training: time, ln (small parameter ratio)

3. Quantized Training

--quant int8/nf4

4. Infinite Length Training (infctx)

--train_type infctx --chunk_ctx 512 --ctx_len 2048

ctx_len: Target training length
chunk_ctx: Slice length, must be smaller than ctx_len

5. Data Loading Strategy

--dataload pad

get: Default random sampling (RWKV-LM style)
pad: Fixed-length padding sampling
only: Single data sampling (only supports bsz=1)

6. DeepSpeed Strategy

--strategy deepspeed_stage_1

Available strategies:

deepspeed_stage_1: Preferred option
deepspeed_stage_2/3: For large models or full fine-tuning
deepspeed_stage_2_offload
deepspeed_stage_3_offload

7. FLA Operator

By default, RWKV-PEFT uses custom CUDA kernels for wkv computation. However, you can use --fla to enable the Triton kernel:

--fla

GPU Support

NVIDIA: CUDA
Intel, Moore Threads, Musa, Iluvatar CoreX: FLA, which means you need to pass --fla
Ascend: CANN (soon)

Citation

If you find this project helpful, please cite our work:

@misc{kang2024boneblockaffinetransformation,
      title={Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models},
      author={Jiale Kang},
      year={2024},
      eprint={2409.15371},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.15371}
}

JL-er / RWKV-PEFT

readme