Closed paryska99 closed 1 year ago
Is it possible to make prompt processing faster with help of a gpu device, just like CuBLAS or ClBlast can with CPU hosted Llama models or other?
It is possible, but would require implementing sequence processing mode. Currently, only RNN mode is implemented, that is, processing token-by-token.
Is it possible to make prompt processing faster with help of a gpu device, just like CuBLAS or ClBlast can with CPU hosted Llama models or other?