Zhen-Dong / HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
MIT License
406 stars 83 forks source link

MobileNetV2 TVM W4A4 inference #42

Closed qiulinzhang closed 3 months ago

qiulinzhang commented 3 months ago

Dear Zhen-Dong, Thanks for your great work. I finished resnet18/50 W4A4/W8A8 inference with TVM on cuda, RTX 3090Ti.

ResNet18: 0.22 ms W4A4, 0.26ms W8A8

Now I want to infer MobileNetV2 which has depthwise convolution. I reimplement MobileNet-V2 based on resnet18, but I failed at autotvm config period. Thus I got a high inference time as follows,

WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 160, 7, 7), 'int8'), ('TENSOR', (960, 160, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda, workload=('depthwise_conv2d_nchw.cuda', ('TENSOR', (8, 960, 9, 9), 'int8'), ('TENSOR', (960, 1, 3, 3), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'int32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 960, 7, 7), 'int8'), ('TENSOR', (160, 960, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 960, 7, 7), 'int8'), ('TENSOR', (320, 960, 1, 1), 'int8'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda, workload=('conv2d_NCHWc_int8.cuda', ('TENSOR', (8, 320, 9, 9), 'int8'), ('TENSOR', (1280, 320, 3, 3), 'int8'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'int32'). A fallback configuration is used, which may bring great performance regression. WARNING:autotvm:Cannot find config for target=cuda, workload=('dense_int8.cuda', ('TENSOR', (8, 1280), 'int8'), ('TENSOR', (1000, 1280), 'int8'), None, 'int32'). A fallback configuration is used, which may bring great performance regression. Performed inference in 61.58ms (std = 0.10) for 8 samples Average per sample inference time: 7.70ms

Is there any helpful suggestions or autotvm configs?

qiulinzhang commented 3 months ago

Thanks for you code with tvm tuning function. I simply solve the problem with tuning-enable = True.