RWKV (pronounced as RWaKuV) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).
RWKV is an Open Source, non profit group, under the linux foundation. Supported by our sponsors.
So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free sentence embedding. Moreover it's 100% attention-free.
Version | v4 - Raven | v4 - Dove | v5 - Eagle | v6 - Finch |
---|---|---|---|---|
Paper | πPaper Accepted @ EMNLP 2023 | (no architecture change) | π§ stable (current version) | π§ͺ prototype |
Overall Status | π EOL - Recommended to use v5 world instead | π EOL - Recommended to use v5 world instead | β General Availability | π§ͺ Early Training |
0.4B model | Fully Trained : rwkv-pile-430m | Fully Trained | β Fully Trained | π§ͺ Early Training |
1.5B model | Fully Trained : rwkv-raven-1b5 | Fully Trained | β Fully Trained | π§ͺ Early Training |
3B model | Fully Trained : rwkv-raven-3b | Fully Trained | β Fully Trained | π§ͺ Early Training |
7B model | Fully Trained : rwkv-raven-7b | Fully Trained | β Fully Trained | ... |
14B model / 7B 2T model | Fully Trained : rwkv-raven-14b | not-planned | scheduled | ... |
8x7B MoE model | not-planned | not-planned | scheduled | ... |
Good
Bad
RWKV is made possible, as an Open Source project, thanks to the large amount of GPU compute and researchers time contributions from
Without their invaluable support, we would not have been able to develop the core RWKV foundation models that you see today.
In addition, we would like to thank
For helping with GPU time, on smaller experiments, finetunes, and various models. Especially for those models that never get publically released in failed runs.