codekansas / codekansas.github.io

:computer: Personal blog
https://ben.bolte.cc/
MIT License
3 stars 1 forks source link

rwkv-model #10

Open utterances-bot opened 2 months ago

utterances-bot commented 2 months ago

RWKV Language Model Math | Ben's Blog

In-depth explanation of the math behind the RWKV model, with PyTorch implementations, plus a discussion of numerical stability.

https://ben.bolte.cc/rwkv-model

niklasnolte commented 2 months ago

in the first impl (vanilla), shouldn't ew=exp(+w) instead of exp(-w)? i'm comparing things to the "GPT formulation" in https://github.com/BlinkDL/RWKV-CUDA/tree/main.