In your paper, it was mentioned that you utilized an INT8 quantized Deit model. I noticed that there is a reformat unit in each HCE for type conversion between int8 and float32, but I did not find this module in the open-source code. Could you please provide more details on the specific quantization methods used? Is it static quantization or dynamic quantization? How were weight quantization and activation quantization implemented respectively? What is the precision of the quantized Deit model after quantization?
Hello,
In your paper, it was mentioned that you utilized an INT8 quantized Deit model. I noticed that there is a reformat unit in each HCE for type conversion between int8 and float32, but I did not find this module in the open-source code. Could you please provide more details on the specific quantization methods used? Is it static quantization or dynamic quantization? How were weight quantization and activation quantization implemented respectively? What is the precision of the quantized Deit model after quantization?
Looking forward to your response. Thank you.