Sorry, i am not familiar with the library, I want to run a matmul between a tensor created by pytorch and the q40 weight read from gguf.
I can read the weight from gguf and convert it to pytorch tensor.
But I have no idea to run the matmul between a tensor created by pytorch and the q40 weight by cuda kernel.
Sorry, i am not familiar with the library, I want to run a matmul between a tensor created by pytorch and the q40 weight read from gguf. I can read the weight from gguf and convert it to pytorch tensor. But I have no idea to run the matmul between a tensor created by pytorch and the q40 weight by cuda kernel.