FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Support for RWKV language model #1

Open BlinkDL opened 1 year ago

BlinkDL commented 1 year ago

FlexGen looks great :)

Would you like to support RWKV language model? It's an RNN (actually a linear transformer with both GPT & RNN mode, so quite similar with usual GPT) with GPT-level performance - no attention, so faster and saves VRAM. And there is already a 14B params model:

https://github.com/BlinkDL/ChatRWKV

You are welcome to join RWKV discord if you are interested :)