bug in new wkv6state_cuda

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.47k stars 847 forks source link

bug in new wkv6state_cuda #241

Open SmerkyG opened 5 months ago

SmerkyG commented 5 months ago

https://github.com/BlinkDL/RWKV-LM/blob/666f64591e13c68ed6e602e957c5ca47b25750e3/RWKV-v5/cuda/wkv6state_cuda.cu#L15

This line is missing the batch offset and should read: _s += b*H*_N_*_N_ + h*_N_*_N_ + i*_N_;

Probably why this code didn't work for BPTT when we tried it a while back!