ZiweiWangTHU / Quantformer

This is the official pytorch implementation for the paper: *Quantformer: Learning Extremely Low-precision Vision Transformers*.
Apache License 2.0
18 stars 3 forks source link

HAQ Problem #1

Open CoffeeCat3008871 opened 1 year ago

CoffeeCat3008871 commented 1 year ago

Hi, thank you for sharing your excellent work. In Table 6 of your paper, you showed how simply applying HAQ affects accuracy of several Deit Models comparing to baseline. However, in the supplemental Appendix A., you mentioned that HAQ was only used for "fully-connect layers in vision transformer". Does that mean by applying HAQ there was no quantization search applied to the MatMul part in self-attention layer? And can you please clarify how you apply HAQ in your code since I saw that you used autocast() but not sure how you did search part by using HAQ.

ChangyuanWang17 commented 1 year ago

Thanks. For the first question, according to the implementation of ViT in Timm, we quantize all linear layers and apply HAQ for QKV formulation and after MatMul in multi-head self-attention. For the second question, we apply HAQ in our code to search the optimal bitwidth for each linear and apply the bitwidth for finetuning.