Polygraphy: How to write the data_loader.py to send the calibration data?

Kongsea commented 1 month ago

The example data_loader.py file used the fake data. I want to know how to write the file to send image files data to Polygraphy to calibrate the model and improve the accuracy.

Such as the axis, the data range, and so on. The axis is image_num, image_channel, height, width or the other? The data range is [0, 1] or [0, 255]? It should be the same as the pth model input or be stricted to a fixed range?

Thank you for any suggestions or help.

Kongsea commented 1 month ago

Use trtexec --onnx=model.onnx --saveEngine=model.trt--int8 without calibration data to quantize the model can get a trt model to inference and get a low precision image.

However, use polygraphy convert model.onnx --int8 -o model.trt without calibration data to quantize the model can get a trt model whose output is abnormal with very small numbers.

Then I write a data_loader.py to use polygraphy to quantize the onnx model with calibration data, the output is very similar with no calibration data. I was very confused.

def load_data():
    for i, image in enumerate(images):
        img = cv2.imread(image, 0)
        if len(img.shape) == 2:
            img = np.expand_dims(img, axis=2)
        img = (np.transpose(np.ascontiguousarray(np.expand_dims(img, axis=0)), (0, 3, 1, 2))).astype(np.float16)
        yield {
            "input": img
        }

yuanyao-nv commented 1 month ago

I think the trtexec and polygraphy commands should be doing the same thing. Not sure why they are giving different results. cc: @pranavm-nvidia

pranavm-nvidia commented 1 month ago

trtexec will initialize the dynamic ranges to fixed values while polygraphy will calibrate on the input data (if none is provided, then it would be synthetic data). How many images are you using for calibration?

Kongsea commented 1 month ago

This is the output using --fp16 of trtexec to quantize without the calibration:

The following is using --int8 of trtexec without calibration:

The following is using --best of trtexec without calibration:

The following is using --int8 of trtexec with int8 calibration data:

So I want to know if it's cause by an incorrect calibration data generation method.

When using polygraphy, an error is raised now:

[E] 1: [calibrator.cpp::add::798] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [executionContext.cpp::commonEmitDebugTensor::1517] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a972910'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a97a4d0'.)
........................
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7e2c0d90'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a89aa90'.)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
.......
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 2: [calibrator.cpp::calibrateEngine::1222] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

However, it works well before and I don't modify anything.

Kongsea commented 1 month ago

trtexec will initialize the dynamic ranges to fixed values while polygraphy will calibrate on the input data (if none is provided, then it would be synthetic data). How many images are you using for calibration?

I have tried to use 500/1000 and more than 3000 images to calibrate the model, However, the result is almost the same.

pranavm-nvidia commented 1 month ago

Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.

Kongsea commented 1 month ago

Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.

I used fp16 when training the network. So do I need use fp32 to calibrate the model when I quantize it? Thank you.

pranavm-nvidia commented 1 month ago

I believe so. We disable FP16 mode when calibrating.

The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.

Kongsea commented 1 month ago

I believe so. We disable FP16 mode when calibrating.

The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.

OK. Thank you. I will have a try.

NVIDIA / TensorRT

Polygraphy: How to write the data_loader.py to send the calibration data? #4196