RWKV only show lower GPU memory occupancy when inference?

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.71k stars 868 forks source link

RWKV only show lower GPU memory occupancy when inference? #250

Open thucz opened 4 months ago

thucz commented 4 months ago

I tried to use RWKV(e.g., Vision-RWKV) in CV tasks. But I found RWKV shows similar GPU memory occupancy to full-attention Transformer (like ViT) when training. I found both RWKV and Vision-RWKV only present their inference memory occupancy in the paper.

The high memory consume is not friendly for my tasks. Do you have any advice?

BlinkDL commented 4 months ago

Hi may I know your ctxlen

thucz commented 4 months ago

ctx_len is 8192

BlinkDL commented 2 months ago

Please check whether attention/rwkv is your bottleneck