BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

Main differences between versions? #120

Closed BrightXiaoHan closed 1 year ago

BrightXiaoHan commented 1 year ago

What are the main differences and improvements between V1, V2, V3, V4, and V4-neo, and are there any related documents or explanations available?

yhyu13 commented 1 year ago

应该是训练epoch的区别?数据集貌似是一样的

BrightXiaoHan commented 1 year ago

模型实现好像不同版本间也有不同

openRiemann commented 1 year ago

README.md for different versions of RWKV needed