Zhen-Dong / HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
MIT License
406 stars 83 forks source link

Questions with HAWQ/tvm_benchmark #14

Closed Junkyeong-Choi closed 3 years ago

Junkyeong-Choi commented 3 years ago

Hi, I'm about to run the test with "tvm_benchmark/test_resnet_inference.py" on Tesla V100 and compare the result with Tesla T4 device. However, I encountered some errors on building tvm.relay.[relay.build(..)]. I know this is natural consequences as README informs that the procedure is for NVIDIA T4 GPU for inference speed-up But my question is:

image

I would appreciate your reply. Any reply would be helpful for me.

zachzzc commented 3 years ago

Hi @adwwsd ,

Thanks for your interest about this project!

Which part of the code makes GPU device dependancy? I guess it is due to int4 configuration on the code and using specific int4 branch of TVM. Am I right with this?

You are right, it is due to the INT4 configuration and not all GPUs support this. So far only the latest Nvidia A100 and Turing series support tensor core INT4 matrix multiplication.

Even if this is the case, I still got questions with the errors I got because the error was about nvcc compile error on the temporary .cu file with type error. Does nvcc have this much GPU dependancy?

Additionally, it worked fine with my T4 GPU server on the same environment except the device itself.

It looks like this GPU doesn't support INT4 tensor core intrinsics. Because eventually the computation schedule will call wmma CUDA intrinsics to calculate a small matrix multiplication, if this GPU doesn't support INT4 data type operation I think nvcc will report errors.

Junkyeong-Choi commented 3 years ago

Thanks a lot for your answer!