BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

How to finetune RWKV? #94

Open kxzxvbk opened 1 year ago

kxzxvbk commented 1 year ago

Hi, thanks for your work :) Now, I'm wondering how can I finetune RWKV given a pretrained model. I know that there is one repo (https://github.com/Blealtan/RWKV-LM-LoRA ) using LoRA for finetuning. But I suppose that this repo is not good enough for reasons:

zeroplum commented 1 year ago

https://github.com/BlinkDL/RWKV-v2-RNN-Pile

kxzxvbk commented 1 year ago

https://github.com/BlinkDL/RWKV-v2-RNN-Pile

What kind of finetuning methods does this use? I think it tunes all parameters in the model?

kxzxvbk commented 1 year ago

I get a wonderful solution about this problem. Since the latest version of transformers support RWKV, I can now use peft to finetune RWKV. Here is the demo code:

from transformers import AutoTokenizer, RwkvForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_int8_training

target_modules = ["feed_forward.value"]
config = LoraConfig(
    r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
)

tokenizer = AutoTokenizer.from_pretrained("URL_OF_HUGGINGFACE", trust_remote_code=True)
model = RwkvForCausalLM.from_pretrained("URL_OF_HUGGINGFACE", trust_remote_code=True)
model = prepare_model_for_int8_training(model)
lora_model = get_peft_model(model, config)
lora_model.print_trainable_parameters()

lora_model.train()
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = lora_model(**inputs, labels=inputs["input_ids"])
loss, logits = outputs.loss, outputs.logits
print(loss, logits)
muhammed-saeed commented 1 year ago

assume that I have training data - json or tsv - in the format {"instruction": THE INSTRUCTION", input:"THE INPUT", output:"DESIRED OUTPUT"} how can I modify your peft code to work with this data ?

kxzxvbk commented 1 year ago

assume that I have training data - json or tsv - in the format {"instruction": THE INSTRUCTION", input:"THE INPUT", output:"DESIRED OUTPUT"} how can I modify your peft code to work with this data ?

Hope that this repo can help you: https://github.com/tatsu-lab/stanford_alpaca

muhammed-saeed commented 1 year ago

assume that I have training data - json or tsv - in the format {"instruction": THE INSTRUCTION", input:"THE INPUT", output:"DESIRED OUTPUT"} how can I modify your peft code to work with this data ?

Hope that this repo can help you: https://github.com/tatsu-lab/stanford_alpaca

Thanks for your response, I have question can I use the same training code there but instead of passing llama model I pass to the model the RWKV models ?

SetoKaiba commented 1 year ago

I get a wonderful solution about this problem. Since the latest version of transformers support RWKV, I can now use peft to finetune RWKV. Here is the demo code:

from transformers import AutoTokenizer, RwkvForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_int8_training

target_modules = ["feed_forward.value"]
config = LoraConfig(
    r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
)

tokenizer = AutoTokenizer.from_pretrained("URL_OF_HUGGINGFACE", trust_remote_code=True)
model = RwkvForCausalLM.from_pretrained("URL_OF_HUGGINGFACE", trust_remote_code=True)
model = prepare_model_for_int8_training(model)
lora_model = get_peft_model(model, config)
lora_model.print_trainable_parameters()

lora_model.train()
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = lora_model(**inputs, labels=inputs["input_ids"])
loss, logits = outputs.loss, outputs.logits
print(loss, logits)

Can this code snippet be used to fine tune world model? It seems that the world model use a different tokenizer and vocab list.

winglian commented 10 months ago

I get a wonderful solution about this problem. Since the latest version of transformers support RWKV, I can now use peft to finetune RWKV. Here is the demo code:

from transformers import AutoTokenizer, RwkvForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_int8_training

target_modules = ["feed_forward.value"]
config = LoraConfig(
    r=4, lora_alpha=16, target_modules=target_modules, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM"
)

tokenizer = AutoTokenizer.from_pretrained("URL_OF_HUGGINGFACE", trust_remote_code=True)
model = RwkvForCausalLM.from_pretrained("URL_OF_HUGGINGFACE", trust_remote_code=True)
model = prepare_model_for_int8_training(model)
lora_model = get_peft_model(model, config)
lora_model.print_trainable_parameters()

lora_model.train()
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = lora_model(**inputs, labels=inputs["input_ids"])
loss, logits = outputs.loss, outputs.logits
print(loss, logits)

I assume this is only for RWKV4? @BlinkDL is there any timeline for getting RWKV5 into transformers?

EasonXiao-888 commented 5 months ago

hello i want to fine-tune RWKV on 4096 context length, but it will take an error by if seq_len > rwkv_cuda_kernel.max_seq_length: raise ValueError( f"Cannot process a batch with {seq_len} tokens at the same time, use a maximum of " f"{rwkv_cuda_kernel.max_seq_length} with this model." ) I would like to know if you have encountered it or know how to solve it?