issues
search
SJTU-IPADS
/
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.98k
stars
415
forks
source link
Fix segmentation fault for models exceeding 40B on AMD GPUs & optimize mul_mat_axpy operation
#217
Closed
Tworan
closed
2 months ago
Tworan
commented
2 months ago
we fixed the segmentation fault for models exceeding 40B on AMD GPUs.
we optimized the mul_mat_axpy operation and enabled hardware-supported atomic operations on AMD GPUs for better performance.