RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Hi, when following this instruction to run
RWKV-v4neo
on DDP, https://github.com/BlinkDL/RWKV-LM/blob/39a4d461a5102defd2a47f12b64b38466bf8ec4c/RWKV-v4neo/train.py#L23-L30I got this error:
After digging a little bit into the code, i found in the customized cuda kernel,
u
is supposed to be abf16
. https://github.com/BlinkDL/RWKV-LM/blob/39a4d461a5102defd2a47f12b64b38466bf8ec4c/RWKV-v4neo/cuda/wkv_op_bf16.cpp#L5but here
u
is a float: https://github.com/BlinkDL/RWKV-LM/blob/39a4d461a5102defd2a47f12b64b38466bf8ec4c/RWKV-v4neo/src/model.py#L60A simple workaround would be changing this to
u = u.contiguous().bfloat16()
. And this works for me. https://github.com/BlinkDL/RWKV-LM/blob/39a4d461a5102defd2a47f12b64b38466bf8ec4c/RWKV-v4neo/src/model.py#L56