Closed fengyuentau closed 8 months ago
Hi, could you please give more information ? For example, what do you mean by "quantized attention"?Is it simply refers to quantizing all data types or other practices such as specific op quantization(e.g. softmax)?
Something like this: https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QAttention. Operator prototype may vary though.
Thank you for your interest, unfortunately we have no plans to support this project yet.
Ok, thanks for the response. One more question is any ongoing plan about attention?
It will depends on HW support.Which board (IP) are you currently using?
Which board (IP) are you currently using?
Khadas VIM3 with A311D SoC.
Which board (IP) are you currently using?
Khadas VIM3 with A311D SoC. This hardware can provide functional support but cannot guarantee good performance such as accuracy and efficiency of GEMM operations in attention models.
Okay, thank you for your patience and all the helpful information.
Hello guys, thanks for the great work! I wonder if you have any plan to support quantized attention?