关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.4k stars 843 forks source link

关于RUN_CUDA_RWKV6这部分，最好用pytorch实现，否则不方便移植 #252

Open bobo-wmdigit opened 3 weeks ago

bobo-wmdigit commented 3 weeks ago

看了下论文的方向，挺棒的，但是整个设计对实际想进一步研究的人非常不友好，因为想用这个框架的，都是希望移植到边缘端，可是核心代码，用的又是cuda实现的，移植起来非常麻烦，还要自己手动对齐，好像除了1代都是这么干的？我也去测试了demo，感觉对终止符的推荐也不是很好，建议这么好的理论框架，最好能够设计的更方便大家去实验，才有机会被真正落地用起来。仅供参考。

BlinkDL commented 2 weeks ago

谢谢关注，推理不需要cuda（虽然有cuda会prefill更快）： https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_v6_demo.py

以及这是聊天demo（用\n\n作为终止符，因为我会将用户输入内容中的\n\n全部替换为\n） https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py

BlinkDL commented 1 week ago

另外请看 https://github.com/TorchRWKV/rwkv-kit