I'm trying to implement the low level API into my own program, loading the model(I am using Pygmalion-13B.ggmlv3.Q6_K.gguf) works fine and I get no errors. Now when I try to evaluate the model via llama_cpp.llama_eval I get:
I've tried type casting the parameters to C integers as seen in the error log snipped above and plain integers too. The self.context is of type llama_cpp.llama_context_p and self.NTHREADS is being retrieved via multiprocessing as seen inside the low level API example in this repository.
My only guess is it's a problem with the model itself, unfortunately I don't have another gguf model at hand which could be used to test this theory.
I'm trying to implement the low level API into my own program, loading the model(I am using Pygmalion-13B.ggmlv3.Q6_K.gguf) works fine and I get no errors. Now when I try to evaluate the model via llama_cpp.llama_eval I get:
I've tried type casting the parameters to C integers as seen in the error log snipped above and plain integers too. The self.context is of type llama_cpp.llama_context_p and self.NTHREADS is being retrieved via multiprocessing as seen inside the low level API example in this repository.
My only guess is it's a problem with the model itself, unfortunately I don't have another gguf model at hand which could be used to test this theory.