bytedance / decoupleQ

A quantization algorithm for LLM
Apache License 2.0
94 stars 5 forks source link

矩阵乘性能数据 #10

Closed yyfcc17 closed 1 month ago

yyfcc17 commented 3 months ago

你好,请问有W4A16与FP16矩阵乘的具体性能对比数据吗?

gavinchen430 commented 3 months ago

在A30上,m=1,n=16384,k=4096 FP16带宽大概750GB/s, W4A16大概是600GB/s。

yyfcc17 commented 3 months ago

谢谢回复,请问具体的FP16矩阵乘和W4A16,W2A16矩阵乘,总体时间上加速如何呢?