Open Kongsea opened 1 month ago
Use trtexec --onnx=model.onnx --saveEngine=model.trt--int8
without calibration data to quantize the model can get a trt model to inference and get a low precision image.
However, use polygraphy convert model.onnx --int8 -o model.trt
without calibration data to quantize the model can get a trt model whose output is abnormal with very small numbers.
Then I write a data_loader.py
to use polygraphy
to quantize the onnx model with calibration data, the output is very similar with no calibration data. I was very confused.
def load_data():
for i, image in enumerate(images):
img = cv2.imread(image, 0)
if len(img.shape) == 2:
img = np.expand_dims(img, axis=2)
img = (np.transpose(np.ascontiguousarray(np.expand_dims(img, axis=0)), (0, 3, 1, 2))).astype(np.float16)
yield {
"input": img
}
I think the trtexec and polygraphy commands should be doing the same thing. Not sure why they are giving different results. cc: @pranavm-nvidia
trtexec
will initialize the dynamic ranges to fixed values while polygraphy
will calibrate on the input data (if none is provided, then it would be synthetic data).
How many images are you using for calibration?
This is the output using --fp16
of trtexec
to quantize without the calibration:
The following is using --int8
of trtexec
without calibration:
The following is using --best
of trtexec
without calibration:
The following is using --int8
of trtexec
with int8 calibration data:
So I want to know if it's cause by an incorrect calibration data generation method.
When using polygraphy, an error is raised now:
[E] 1: [calibrator.cpp::add::798] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [executionContext.cpp::commonEmitDebugTensor::1517] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a972910'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a97a4d0'.)
........................
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7e2c0d90'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a89aa90'.)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
.......
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 2: [calibrator.cpp::calibrateEngine::1222] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
However, it works well before and I don't modify anything.
trtexec
will initialize the dynamic ranges to fixed values whilepolygraphy
will calibrate on the input data (if none is provided, then it would be synthetic data). How many images are you using for calibration?
I have tried to use 500/1000 and more than 3000 images to calibrate the model, However, the result is almost the same.
Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.
Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.
I used fp16 when training the network. So do I need use fp32 to calibrate the model when I quantize it? Thank you.
I believe so. We disable FP16 mode when calibrating.
The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.
The example data_loader.py file used the fake data. I want to know how to write the file to send image files data to Polygraphy to calibrate the model and improve the accuracy.
Such as the axis, the data range, and so on. The axis is
image_num, image_channel, height, width
or the other? The data range is[0, 1] or [0, 255]
? It should be the same as the pth model input or be stricted to a fixed range?Thank you for any suggestions or help.