Support for RWKV language model

FlexGen looks great :)

Would you like to support RWKV language model? It's an RNN (actually a linear transformer with both GPT & RNN mode, so quite similar with usual GPT) with GPT-level performance - no attention, so faster and saves VRAM. And there is already a 14B params model:

https://github.com/BlinkDL/ChatRWKV

You are welcome to join RWKV discord if you are interested :)

FMInference / FlexLLMGen

Support for RWKV language model #1