antimatter15 / alpaca.cpp

Locally run an Instruction-Tuned Chat-Style LLM
MIT License
10.26k stars 912 forks source link

Improve ALPACA speed using GPU #144

Open multimediaconverter opened 1 year ago

multimediaconverter commented 1 year ago

Take a look at this project:

https://github.com/Const-me/Whisper

It is a Windows port of the ggerganov's whisper.cpp implementation using DirectCompute -- another name for that technology is "compute shaders in Direct3D 11".

Author claims that it shouldn’t be hard to support another ML model with the compute shaders and relevant infrastructure already implemented in this project.

I suppose that this library may improve speed of ALPACA chat significantly.

openMolNike commented 1 year ago

I am not an expert, but I suspect that even though the GPU is faster, but for this, the weights of the neural networks must fit in the GPU memory. The NVIDIA RTX 4090 has 24 GB, so the 30b model will not fit there, but 13b works well on the CPU

fenixlam commented 1 year ago

add a parameter to allow the user to choose to use GPU or CPU... even the directml XD would be the best choice.