Open utterances-bot opened 2 months ago
In-depth explanation of the math behind the RWKV model, with PyTorch implementations, plus a discussion of numerical stability.
https://ben.bolte.cc/rwkv-model
in the first impl (vanilla), shouldn't ew=exp(+w) instead of exp(-w)? i'm comparing things to the "GPT formulation" in https://github.com/BlinkDL/RWKV-CUDA/tree/main.
RWKV Language Model Math | Ben's Blog
In-depth explanation of the math behind the RWKV model, with PyTorch implementations, plus a discussion of numerical stability.
https://ben.bolte.cc/rwkv-model