BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

Initializing single layer #137

Closed DanielRoeder1 closed 1 year ago

DanielRoeder1 commented 1 year ago

Hello,

I am planning to use a single Block of the RWKV architecture on top of another model. As I am fairly new to using Cuda kernels I am not sure how I would correctly initialize a single RWKV block alongside my model.

The idea is that the block receives a collection of tokens which it uses to build the internal state.

Thankful for any guidance.

BlinkDL commented 1 year ago

Hi you can ask in Discord https://discord.gg/bDSBUMeFpc