RWKV / RWKV-wiki

RWKV centralised docs for the community
17 stars 6 forks source link

RWKV raven avartar

RWKV Language Model

RWKV (pronounced as RWaKuV) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).

RWKV is an Open Source, non profit group, under the linux foundation. Supported by our sponsors.

So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free sentence embedding. Moreover it's 100% attention-free.

RWKV architecture paper

RWKV paper cover

Current Version Status

Version v4 - Raven v4 - Dove v5 - Eagle v6 - Finch
Paper πŸŽ“Paper Accepted @ EMNLP 2023 (no architecture change) πŸ”§ stable (current version) πŸ§ͺ prototype
Overall Status 🌚 EOL - Recommended to use v5 world instead 🌚 EOL - Recommended to use v5 world instead βœ… General Availability πŸ§ͺ Early Training
0.4B model Fully Trained : rwkv-pile-430m Fully Trained βœ… Fully Trained πŸ§ͺ Early Training
1.5B model Fully Trained : rwkv-raven-1b5 Fully Trained βœ… Fully Trained πŸ§ͺ Early Training
3B model Fully Trained : rwkv-raven-3b Fully Trained βœ… Fully Trained πŸ§ͺ Early Training
7B model Fully Trained : rwkv-raven-7b Fully Trained βœ… Fully Trained ...
14B model / 7B 2T model Fully Trained : rwkv-raven-14b not-planned scheduled ...
8x7B MoE model not-planned not-planned scheduled ...

TLDR vs Existing transformer models

Good

Bad

Who sponsors the compute for RWKV?

RWKV is made possible, as an Open Source project, thanks to the large amount of GPU compute and researchers time contributions from

Without their invaluable support, we would not have been able to develop the core RWKV foundation models that you see today.


In addition, we would like to thank

For helping with GPU time, on smaller experiments, finetunes, and various models. Especially for those models that never get publically released in failed runs.

Quick RWKV community terminology

Which RWKV models should I be using?