Optimize `mul_mat_sparse` for INT4 quantized weights - Githubissues

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

MIT License

7.9k stars 406 forks source link

Optimize `mul_mat_sparse` for INT4 quantized weights #174

Closed hodlen closed 6 months ago