Zhen-Dong / HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
MIT License
406 stars 83 forks source link

pack_int32_to_int4 #29

Open YangNuoCheng opened 2 years ago

YangNuoCheng commented 2 years ago

In 'HAWQ-main/tvm_benchmark/hawq_utils_resnet50.py' ,we pack 8 'int4' number to 1 'int32' number, so we got int4 speedup. Can we pack 16 'int2' to 1 'int32', to got int2 speedup?

zachzzc commented 2 years ago

Yes. The purpose of the packing is to handle with memory movement with a datatype that is supported in the target hardware (int8, int32 in cpu/gpu). If you want to further reduce the precision to int2, in cpu/gpu you also need to pack them into a byte-addressable data type (int8, int32) before the memory movement

YangNuoCheng commented 2 years ago

Yes. The purpose of the packing is to handle with memory movement with a datatype that is supported in the target hardware (int8, int32 in cpu/gpu). If you want to further reduce the precision to int2, in cpu/gpu you also need to pack them into a byte-addressable data type (int8, int32) before the memory movement

Thank you for your reply! Actually I am reproducing your great project, and I Try to reply it in my research. Thanks a lot!