Closed leiwen83 closed 2 years ago
Hi Lei,
Our implementation in TVM is to show the overall speed-up and accuracy we can get. The overa speed for int8 is still slower than tensorrt because the TVM tensorcore convolution schedule is suboptimal. Currently int4 convolution is not available in tensorrt and it’s not open sourced. That’s why we choose TVM to see the speed up of int8 vs int4. When int4 is available in tensorrt in the future, we will import the trained weights and benchmark it.
Thanks, Zach
I see. I could help the porting int8 over tensorrt as the first try. But when I try to do the tvm inference, I met the error. Which version of tvm are you using for running the benchmark?
python hawq_utils_resnet50.py --model-dir data/
Traceback (most recent call last):
File "hawq_utils_resnet50.py", line 9, in <module>
from mixed_precision_models.layers import QConfig, QuantizeContext
File "/data/dev/quant/hawq/tvm_benchmark/mixed_precision_models/__init__.py", line 1, in <module>
from . import layers
File "/data/dev/quant/hawq/tvm_benchmark/mixed_precision_models/layers.py", line 12, in <module>
defaults=('int32', 65.0, 0.0, 'int8', 8.0, 0.0, 'int8', 8.0, 0.0, 'int32', 74.0, 0.0))
TypeError: namedtuple() got an unexpected keyword argument 'defaults'
Hi Lei,
what's the python version you are using? It looks like a namedtuple python data declaration mismatch. I am using Python 3.7.4 and don't see this error.
Zach
Hi Zach,
I switch to python3.7, but meet new error:
(512, 256, 1, 1) module.stage4.unit2.quant_convbn1.weight_integer (512, 512, 3, 3) module.stage4.unit2.quant_convbn2.weight_integer (512, 512, 3, 3) module.quant_output.weight_integer (1000, 512) Traceback (most recent call last):
File "hawq_utils_resnet50.py", line 499, in
File "hawq_utils_resnet50.py", line 136, in save_weights renamed_params['conv0_weight'] = params['module.quant_init_convbn.weight_integer']
KeyError: 'module.quant_init_convbn.weight_integer'
which weight checkpoint you are using ?
I am using the checkpoint that created by local training. After downloading the checkpoint from modelzoo, it seem works now.
However there is still some problem in inference:
File "test_resnet_inference.py", line 23, in
ModuleNotFoundError: No module named 'hawq_utils'
I haven't found any module contained in this repo, does this module come from other git?
I've updated the test_resnet_inference.py, please pull the updates and try again
Get new error...
python3.7 test_resnet_inference_time.py
Traceback (most recent call last):
File "test_resnet_inference_time.py", line 178, in <module>
debug_unit=args.debug_unit)
File "/data/dev/quant/hawq/tvm_benchmark/mixed_precision_models/quantized_resnet_v1.py", line 614, in get_workload
**kwargs)
File "/data/dev/quant/hawq/tvm_benchmark/mixed_precision_models/quantized_resnet_v1.py", line 557, in get_net
with_softmax=with_softmax)
File "/data/dev/quant/hawq/tvm_benchmark/mixed_precision_models/quantized_resnet_v1.py", line 362, in qnn_resnet_v1
data_layout=_data_layout, kernel_layout=kernel_layout)
File "/data/dev/quant/hawq/tvm_benchmark/mixed_precision_models/layers.py", line 122, in quantized_conv2d
kernel_size=kernel_size, channels=output_channels, data_layout=data_layout, kernel_layout=kernel_layout, strides=strides, padding=padding, **kwargs)
File "/data/dev/inference/tvm/python/tvm/relay/qnn/op/qnn.py", line 278, in conv2d
data_layout, kernel_layout, out_layout, out_dtype)
File "/data/dev/inference/tvm/python/tvm/_ffi/_ctypes/function.py", line 207, in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (4) /data/dev/inference/tvm/build/libtvm.so(TVMFuncCall+0x61) [0x7f517f78ea51]
[bt] (3) /data/dev/inference/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), void tvm::runtime::TypedPackedFunc<tvm::relay::Expr (tvm::relay::Expr, tvm::relay::Expr, int, int, double, double, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, int, tvm::Expr, tvm::Array<tvm::Expr, void>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::DataType)>::AssignTypedLambda<tvm::relay::Expr (*)(tvm::relay::Expr, tvm::relay::Expr, int, int, double, double, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, int, tvm::Expr, tvm::Array<tvm::Expr, void>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::DataType)>(tvm::relay::Expr (*)(tvm::relay::Expr, tvm::relay::Expr, int, int, double, double, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, int, tvm::Expr, tvm::Array<tvm::Expr, void>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::DataType))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x25d) [0x7f517f6efcdd]
[bt] (2) /data/dev/inference/tvm/build/libtvm.so(void tvm::runtime::detail::unpack_call_dispatcher<tvm::relay::Expr, 0, 16, tvm::relay::Expr (*)(tvm::relay::Expr, tvm::relay::Expr, int, int, double, double, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, int, tvm::Expr, tvm::Array<tvm::Expr, void>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::DataType)>::run<tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue, tvm::runtime::TVMArgValue>(tvm::relay::Expr (* const&)(tvm::relay::Expr, tvm::relay::Expr, int, int, double, double, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, tvm::Array<tvm::Expr, void>, int, tvm::Expr, tvm::Array<tvm::Expr, void>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::DataType), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&, tvm::runtime::TVMArgValue&&)+0x19e) [0x7f517f6ef71e]
[bt] (1) /data/dev/inference/tvm/build/libtvm.so(tvm::runtime::TVMPODValue_::operator double() const+0x159) [0x7f517eff02d9]
[bt] (0) /data/dev/inference/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) [0x7f517ef9fe72]
File "/data/dev/inference/tvm/include/tvm/runtime/packed_func.h", line 447
TVMError: Check failed: type_code_ == kDLFloat (8 vs. 2) : expected float but get ObjectCell
While I try to directly modify test_resnet_inference.py as you did to test_resnet_inference_time.py. I found that it would report missing ./data/input_image_batch_1.npy which require --with-featuremap be append to hawq_utils_resnet50.py.
But --with-featuremap would require ./data/input_image.pth.tar which is missing from the original modelzoo...
Are you using the TVM under HAWQ repo? You need to use that one.
The checkpoint doesn't contain a input image now. If you want to use test_resnet_inference.py, you can create your own image and save as input_image_batch_1.npy. I will check in an image as a demo.
After switch to internal tvm repo, previous error seems goes away, but new one comes... How about create a dockerfile to describe your working environment? Like using nvcr.io/nvidia/pytorch:20.12-py3 or etc as the base image?
...100%, 0.40 MB, 463 KB/s, 0 seconds passed
Traceback (most recent call last):
File "test_resnet_inference_time.py", line 232, in <module>
graph, lib, params = relay.build(func, target=TARGET_NAME, params=params)
File "/data/dev/quant/hawq/tvm/python/tvm/relay/build_module.py", line 251, in build
graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
File "/data/dev/quant/hawq/tvm/python/tvm/relay/build_module.py", line 120, in build
self._build(mod, target, target_host)
File "/data/dev/quant/hawq/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 219, in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /data/dev/quant/hawq/tvm/build/libtvm.so(std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x17) [0x7f76570d2527]
[bt] (7) /data/dev/quant/hawq/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x191) [0x7f76570d2431]
[bt] (6) /data/dev/quant/hawq/tvm/build/libtvm.so(tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)+0x7b9) [0x7f76570d1849]
[bt] (5) /data/dev/quant/hawq/tvm/build/libtvm.so(tvm::build(tvm::Map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::IRModule, void, void> const&, tvm::Target const&, tvm::BuildConfig const&)+0x4e9) [0x7f7656c668f9]
[bt] (4) /data/dev/quant/hawq/tvm/build/libtvm.so(tvm::build(tvm::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&, tvm::BuildConfig const&)+0x275) [0x7f7656c653c5]
[bt] (3) /data/dev/quant/hawq/tvm/build/libtvm.so(tvm::codegen::Build(tvm::IRModule, tvm::Target const&)+0x239) [0x7f7656ca8a89]
[bt] (2) /data/dev/quant/hawq/tvm/build/libtvm.so(void tvm::runtime::detail::unpack_call<tvm::runtime::Module, 2, tvm::runtime::Module (*)(tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>(tvm::runtime::Module (* const&)(tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)+0x18b) [0x7f7656ccb7eb]
[bt] (1) /data/dev/quant/hawq/tvm/build/libtvm.so(tvm::codegen::BuildCUDA(tvm::IRModule, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0xd21) [0x7f7657173011]
[bt] (0) /data/dev/quant/hawq/tvm/build/libtvm.so(+0xd7e69b) [0x7f76571f369b]
File "/data/dev/quant/hawq/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
rv = local_pyfunc(*pyargs)
File "/data/dev/quant/hawq/tvm/python/tvm/autotvm/measure/measure_methods.py", line 599, in tvm_callback_cuda_compile
ptx = nvcc.compile_cuda(code, target=target, arch=AutotvmGlobalScope.current.cuda_target_arch)
File "/data/dev/quant/hawq/tvm/python/tvm/contrib/nvcc.py", line 103, in compile_cuda
raise RuntimeError(msg)
RuntimeError: Compilation error:
/tmp/tmpstr8srtm/my_kernel.cu(18): error: name followed by "::" must be a class or namespace name
/tmp/tmpstr8srtm/my_kernel.cu(19): error: name followed by "::" must be a class or namespace name
/tmp/tmpstr8srtm/my_kernel.cu(38): error: incomplete type is not allowed
/tmp/tmpstr8srtm/my_kernel.cu(40): error: name followed by "::" must be a class or namespace name
/tmp/tmpstr8srtm/my_kernel.cu(40): error: incomplete type is not allowed
/tmp/tmpstr8srtm/my_kernel.cu(41): error: name followed by "::" must be a class or namespace name
/tmp/tmpstr8srtm/my_kernel.cu(41): error: incomplete type is not allowed
What you suggest is good. We will create a docker file to make life easier.
I suspect the error is caused by CUDA version. We are using CUDA 10.2. tvm instructions here describes the detailed environment.
Are you using the TVM under HAWQ repo? You need to use that one.
The checkpoint doesn't contain a input image now. If you want to use test_resnet_inference.py, you can create your own image and save as input_image_batch_1.npy. I will check in an image as a demo.
Hi, what's the processing of the input image? I see that there is normalizing in quant_train.py to preprocess the input image
Hi, there is error when running "python hawq_utils_resnet50.py --model-dir ./data/resnet50_uniform8 --with-featuremap"
dict_keys(['convbn_scaling_factor', 'fc_scaling_factor', 'weight_integer', 'bias_integer', 'act_scaling_factor'])
(886, 1604, 3)
Traceback (most recent call last):
File "hawq_utils_resnet50.py", line 494, in <module>
feature_map = torch.load(featuremap_name)['featuremap']
File "/home/yuhaibao94/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/yuhaibao94/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/yuhaibao94/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 215, in __init__
super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './data/resnet50_uniform8/featuremaps.pth.tar'
I am using the checkpoint that created by local training. After downloading the checkpoint from modelzoo, it seem works now.
However there is still some problem in inference:
File "test_resnet_inference.py", line 23, in import hawq_utils
ModuleNotFoundError: No module named 'hawq_utils'
I haven't found any module contained in this repo, does this module come from other git?
I think we should modify the "import hawq_utils" to "import hawq_utils_resnet50 as hawq_utils"
Are you using the TVM under HAWQ repo? You need to use that one. The checkpoint doesn't contain a input image now. If you want to use test_resnet_inference.py, you can create your own image and save as input_image_batch_1.npy. I will check in an image as a demo.
Hi, what's the processing of the input image? I see that there is normalizing in quant_train.py to preprocess the input image
Right, the input image needs to be pre-processed and save as input_image_batch_1.npy
Hi, there is error when running "python hawq_utils_resnet50.py --model-dir ./data/resnet50_uniform8 --with-featuremap"
dict_keys(['convbn_scaling_factor', 'fc_scaling_factor', 'weight_integer', 'bias_integer', 'act_scaling_factor']) (886, 1604, 3) Traceback (most recent call last): File "hawq_utils_resnet50.py", line 494, in <module> feature_map = torch.load(featuremap_name)['featuremap'] File "/home/yuhaibao94/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 584, in load with _open_file_like(f, 'rb') as opened_file: File "/home/yuhaibao94/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like return _open_file(name_or_buffer, mode) File "/home/yuhaibao94/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 215, in __init__ super(_open_file, self).__init__(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: './data/resnet50_uniform8/featuremaps.pth.tar'
In this checkpoint we didn't save intermediate feature maps
Hi,
Have we compared the inference speed with TVM result with tensorrt peer? Since we know tensorrt's cnn could reach hw's peek speed.
Thx, Lei