RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
MIT License
1.41k stars 95 forks source link

Decrease memory padding for serial and sequential contexts #132

Closed saharNooby closed 1 year ago

saharNooby commented 1 year ago

For sequence length = 16, padding memory was decreased from 1024 MB to 640 MB.