BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.47k stars 847 forks source link

Probable mistake in Eq. 16 in the preprint #238

Open zeyun-zhong opened 5 months ago

zeyun-zhong commented 5 months ago

Thank you for the great work.

I have checked the code and the paper, and I believe that the "lerp" in Eq. 16 might actually be "ddlerp".

UEA-GCG commented 5 months ago

I also find the code uses ddlerp method. I also think Eq.19 also doesn't match the Table 7.

SmerkyG commented 5 months ago

@zeyun-zhong You are correct, thanks for noticing this! We will update the paper.

@UEA-GCG Thank you for finding this error. I believe you are pointing out that the cumprod in Eq. 19 is indexed incorrectly. (It should go from j=i+1 to t-1, since it's really a reverse cumprod). Please let me know whether you see some other issue(s) as well with this formula.