Fix segmentation fault for models exceeding 40B on AMD GPUs & optimize mul_mat_axpy operation

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

MIT License

7.98k stars 415 forks source link

Closed Tworan closed 2 months ago

Tworan commented 2 months ago

we fixed the segmentation fault for models exceeding 40B on AMD GPUs.
we optimized the mul_mat_axpy operation and enabled hardware-supported atomic operations on AMD GPUs for better performance.