Would you like to support RWKV language model? It's an RNN (actually a linear transformer with both GPT & RNN mode, so quite similar with usual GPT) with GPT-level performance - no attention, so faster and saves VRAM. And there is already a 14B params model:
FlexGen looks great :)
Would you like to support RWKV language model? It's an RNN (actually a linear transformer with both GPT & RNN mode, so quite similar with usual GPT) with GPT-level performance - no attention, so faster and saves VRAM. And there is already a 14B params model:
https://github.com/BlinkDL/ChatRWKV
You are welcome to join RWKV discord if you are interested :)