NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.71k stars 2.12k forks source link

when use IInt8EntropyCalibrator2 multi input there is some problem #2825

Open vogaliccb opened 1 year ago

vogaliccb commented 1 year ago

Description

When I use IInt8EntropyCalibrator2, there are 7 input. so I cudaMemcpy 7 input to bindings in Int8EntropyCalibrator2::getBatch

like:

    CUDA_CHECK(cudaMemcpy(device_input_1_, input1, INPUT_SIZE_1_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[0], INPUT_1_BLOB_NAME_C));
    bindings[0] = device_input_1_;

    CUDA_CHECK(cudaMemcpy(device_input_2_, input2, INPUT_SIZE_2_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[1], INPUT_2_BLOB_NAME_C));
    bindings[1] = device_input_2_;

    CUDA_CHECK(cudaMemcpy(device_input_3_, input3, INPUT_SIZE_3_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[2], INPUT_3_BLOB_NAME_C));
    bindings[2] = device_input_3_;

    CUDA_CHECK(cudaMemcpy(device_input_4_, input4, INPUT_SIZE_4_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[3], INPUT_4_BLOB_NAME_C));
    bindings[3] = device_input_4_;

    CUDA_CHECK(cudaMemcpy(device_input_5_, input5, INPUT_SIZE_5_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[4], INPUT_5_BLOB_NAME_C));
    bindings[4] = device_input_5_;

    CUDA_CHECK(cudaMemcpy(device_input_6_, input6, INPUT_SIZE_6_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[5], INPUT_6_BLOB_NAME_C));
    bindings[5] = device_input_6_;

    CUDA_CHECK(cudaMemcpy(device_input_7_, input7, INPUT_SIZE_7_C * sizeof(float), cudaMemcpyHostToDevice));
    assert(!strcmp(names[6], INPUT_7_BLOB_NAME_C));
    bindings[6] = device_input_7_;

but it report :

03/28/2023-18:37:35] [E] [TRT] 2: [helpers.h::divUp::70] Error Code 2: Internal Error (Assertion n > 0 failed. )
[03/28/2023-18:37:35] [E] [TRT] 3: [engine.cpp::~Engine::306] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::306, condition: mObjectCounter.use_count() == 1. Destroying an engine object before destroying objects it created leads to undefined behavior.
)
[03/28/2023-18:37:35] [E] [TRT] 2: [calibrator.cpp::calibrateEngine::1177] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
build engine done

How to solve this problem? One input is OK!

Environment

TensorRT Version: 8.5.10 CUDA 11.4

zerollzeng commented 1 year ago

Looks like a usage issue to me, I would suggest using Polygraphy so that you don't need to implement the calibrator.

see https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt

A sample usage:

polygraphy convert model.onnx --int8 --data-loader-script ./data_loader.py --calib-base-cls IInt8EntropyCalibrator2 -o model.plan --calibration-cache model.cache
vogaliccb commented 1 year ago

But I cannot install polygraphy in AGX drive Orin X

vogaliccb commented 1 year ago

How to use multi input calibrator in C++? Because I use Int8EntropyCalibrator2 when network has 1 input is OK

AdanWang commented 1 year ago

I have a sample of python code may could help you, give me your email so i can send you this sample.

vogaliccb commented 1 year ago

I have a sample of python code may could help you, give me your email so i can send you this sample.

Thank you! rainwish@alumni.sjtu.edu.cn

vogaliccb commented 1 year ago

I have a sample of python code may could help you, give me your email so i can send you this sample.

I use the python code, but still have this problem

vogaliccb commented 1 year ago

image

7788zun commented 11 months ago

I have a sample of python code may could help you, give me your email so i can send you this sample.

i would like to have a try, could you please send the sample to me ? thanks!

7788zun commented 11 months ago

I have a sample of python code may could help you, give me your email so i can send you this sample.

i would like to have a try, could you please send the sample to me ? thanks!

1017255004@qq.com

JiaoxianDu commented 9 months ago

did you guys finally solve the problem? I meet a similar problem when using python api:

def get_batch(names):
    try:
        # Assume self.batches is a generator that provides batch data.
        data = next(self.batches)
        # Assume that self.device_input is a device buffer allocated by the constructor.
        cuda.memcpy_htod(self.device_input, data)
        return [int(self.device_input)]
    except StopIteration:
        # When we're out of batches, we return either [] or None.
        # This signals to TensorRT that there is no calibration data remaining.
        return None

just don't know how to arrange multi-input, I suppose it's OK when there is single input. And I tried polygraphy, which works fine, but lack of flexibility like when you want to assign the precision of some layers.

Egorundel commented 2 months ago

@zerollzeng I have a same problem, can you help me, please?

https://github.com/NVIDIA/TensorRT/issues/4053