LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Support BLAS for RWKV 4 Raven models #152

Closed ghost closed 1 year ago

ghost commented 1 year ago

Hi, I use an android device to run koboldcpp. Blas is working as expected, even with the new redpajama models.

However I am testing RWKV-4-Raven-3B-v11-Eng99-Other1-20230425-ctx4096-ggml-q5_1.bin and blas does not process the prompts, it's just regular processing.

I expect normal processing for low token prompts, but currently it's processing larger prompts (200 token) without blas.

Thank you for any direction on this issue.

LostRuins commented 1 year ago

BLAS is currently not supported for rwkv. They are working on a different parallel acceleration approach while you can follow in the rwkv.cpp repo

paryska99 commented 1 year ago

I was also just now wondering if that's possible. Thank you for that answer @LostRuins