Help! Want a toy example to run matmul with q40 weight by cuda kernel

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

MIT License

7.9k stars 406 forks source link

Help! Want a toy example to run matmul with q40 weight by cuda kernel #219

Open Eutenacity opened 2 weeks ago

Eutenacity commented 2 weeks ago

Sorry, i am not familiar with the library, I want to run a matmul between a tensor created by pytorch and the q40 weight read from gguf. I can read the weight from gguf and convert it to pytorch tensor. But I have no idea to run the matmul between a tensor created by pytorch and the q40 weight by cuda kernel.