Way to convert resnet50's onnx format to quantized tensorrt engine

BeyondCloud commented 3 years ago

Hi, I am trying to reproduce the resnet50 benchmark result shows on mlperf.org (edged/closed division row 4). Currently I can achieve 2.4ms latency on Jetson Xavier AGX SingleStream scenario . However, I use trtexec to generate the int8 engine file from resnet50_v1.onnx and trtexec does not support calibration (it uses random weight instead). Could you tell me how did you convert resnet50_v1.onnx to int8 engine file (or plan file) you use in this this repo?

psyhtest commented 3 years ago

Hi @BeyondCloud,

To obtain the TensorRT plans, we followed NVIDIA's instructions from their MLPerf Inference v0.5 submission. This happened between the v0.5 and v0.7 submission rounds, i.e. between October 2019 - September 2020. we submitted the results to v0.7 (without DLA support).

After v0.7, we reproduced some of the results with JetPack 4.5, while resolving a few issues along the way. We did not create CK packages for the new plans though.

When the v1.0 results are out on 21 April 2021, I'd encourage you to reproduce them in the same way and share your experience with the community!

Please note that this repository, ctuning/ck-mlperf, is no longer maintained. Please switch to krai/ck-mlperf to get the most recent updates.

BeyondCloud commented 3 years ago

Ok I will take a look, thank you!

BeyondCloud commented 3 years ago

I couldn't thank you enough. I follow the instruction in the new repo and now I am able to reproduce the result.

psyhtest commented 3 years ago

You are very welcome @BeyondCloud!

ctuning / ck-mlperf

Way to convert resnet50's onnx format to quantized tensorrt engine #60