Any plan to add quantized attention?

VeriSilicon / TIM-VX

VeriSilicon Tensor Interface Module

Other

216 stars 85 forks source link

Any plan to add quantized attention? #654

Closed fengyuentau closed 8 months ago

fengyuentau commented 8 months ago

Hello guys, thanks for the great work! I wonder if you have any plan to support quantized attention?

chenfeiyue-cfy commented 8 months ago

Hi, could you please give more information ? For example, what do you mean by "quantized attention"？Is it simply refers to quantizing all data types or other practices such as specific op quantization(e.g. softmax)?

fengyuentau commented 8 months ago

Something like this: https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QAttention. Operator prototype may vary though.

chenfeiyue-cfy commented 8 months ago

Thank you for your interest, unfortunately we have no plans to support this project yet.

fengyuentau commented 8 months ago

Ok, thanks for the response. One more question is any ongoing plan about attention?

chenfeiyue-cfy commented 8 months ago

It will depends on HW support.Which board (IP) are you currently using?

fengyuentau commented 8 months ago

Which board (IP) are you currently using?

Khadas VIM3 with A311D SoC.

chenfeiyue-cfy commented 8 months ago

Which board (IP) are you currently using?

Khadas VIM3 with A311D SoC. This hardware can provide functional support but cannot guarantee good performance such as accuracy and efficiency of GEMM operations in attention models.

fengyuentau commented 8 months ago

Okay, thank you for your patience and all the helpful information.