how to train For long context

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

11.99k stars 825 forks source link

how to train For long context #233

Open EasonXiao-888 opened 3 months ago

EasonXiao-888 commented 3 months ago

when i train a rwkv-v4 for 4096 context length, it takes error in if seq_len > rwkv_cuda_kernel.max_seq_length: raise ValueError( f"Cannot process a batch with {seq_len} tokens at the same time, use a maximum of " f"{rwkv_cuda_kernel.max_seq_length} with this model." )

BlinkDL commented 3 months ago

change T_MAX in model.py