Implement GPTQ for RWKV

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.32k stars 838 forks source link

Implement GPTQ for RWKV #88

Closed 3outeille closed 1 year ago

3outeille commented 1 year ago

@BlinkDL Hi, I am willing to dedicate some time to implement GPTQ for RWKV, is that okay ?

BlinkDL commented 1 year ago

This is exactly what we need :) Please work on ChatRWKV

And please take a look at https://github.com/hahnyuan/RPTQ4LLM

And you only need quantization for matrix*vec (ignore all time_xxx stuff - they have to be in fp32. tiny amt of computation).

3outeille commented 1 year ago

Referencing https://github.com/BlinkDL/ChatRWKV/pull/98 for any questions related to this topic