issues
search
Oneflow-Inc
/
one-codegeex
Apache License 2.0
7
stars
1
forks
source link
attention linear support quantization
#18
Closed
BBuf
closed
1 year ago
BBuf
commented
1 year ago
[x] attention部分的q,k,v以及output的4个linear支持int8量化,序列长度为1024时速度从7.57->14.64,现在应该已经比FasterTransformer的int8推理要快了。