cyrusbehr / tensorrt-cpp-api

TensorRT C++ API Tutorial
MIT License
587 stars 73 forks source link

Batch Size Greater Than 1 Causes "Error, not all required dimensions specified" in Custom TensorRT Implementation #80

Closed pavelgrigoriev closed 2 weeks ago

pavelgrigoriev commented 3 weeks ago

Description:

I am working on a custom model using a class I named HypNet, which is based on the original YoloV8 implementation and utilizes TensorRT for inference. The model's input dimensions are reshaped to [40, 1, 1280]. When I attempt to run the model with a batch size of 2 (or any batch size greater than 1), I encounter the following error:

terminate called after throwing an instance of 'std::runtime_error' what(): Error, not all required dimensions specified.

However, if I set both optBatchSize and batchSize to 1, the error does not occur, and I can successfully obtain results.

Initialization with Batch Size 2 (Error Occurs):

HypNet::HypNet(const std::string& onnxModelPath, const HypNetConfig& config)
    : CLASS_NAMES(config.classNames) {
    Options options;
    options.optBatchSize = 2;
    options.maxBatchSize = 2;
    options.precision = config.precision;
    options.calibrationDataDirectoryPath = config.calibrationDataDirectory;

    if (options.precision == Precision::INT8 && options.calibrationDataDirectoryPath.empty()) {
        throw std::runtime_error("Error: Must supply calibration data path for INT8 calibration");
    }

    m_trtEngine = std::make_unique<Engine<float>>(options);
    auto succ = m_trtEngine->buildLoadNetwork(onnxModelPath, SUB_VALS, DIV_VALS, NORMALIZE);
    if (!succ) {
        throw std::runtime_error("Error: Unable to build or load the TensorRT engine.");
    }
}
std::vector<std::vector<cv::cuda::GpuMat>> HypNet::preprocess(const std::string& lineFilePath) {
    const int batchSize = 2;
    cv::Mat line = loadLine(lineFilePath);
    cv::Mat reshaped_line = line.reshape(40, {1, 1280});
    cv::cuda::GpuMat gpuImg;
    gpuImg.upload(reshaped_line);
    std::vector<cv::cuda::GpuMat> batchInput;
    batchInput.reserve(batchSize);

    for (int i = 0; i < batchSize; ++i) {
        batchInput.push_back(gpuImg.clone());
    }

    std::vector<std::vector<cv::cuda::GpuMat>> inputs{std::move(batchInput)};
    return inputs;
}

Initialization with Batch Size 1 (Works as Expected):

    HypNet::HypNet(const std::string& onnxModelPath, const HypNetConfig& config)
        : CLASS_NAMES(config.classNames) {
        Options options;
        options.optBatchSize = 1;
        options.maxBatchSize = 1;
        options.precision = config.precision;
        options.calibrationDataDirectoryPath = config.calibrationDataDirectory;

        if (options.precision == Precision::INT8 && options.calibrationDataDirectoryPath.empty()) {
            throw std::runtime_error("Error: Must supply calibration data path for INT8 calibration");
        }

        m_trtEngine = std::make_unique<Engine<float>>(options);
        auto succ = m_trtEngine->buildLoadNetwork(onnxModelPath, SUB_VALS, DIV_VALS, NORMALIZE);
        if (!succ) {
            throw std::runtime_error("Error: Unable to build or load the TensorRT engine.");
        }
    }

Additional Information:

Expected Output: When using a batch size of 2, I expect the output to be in the format of [2, 6, 1280]. Current Setup: Repository Version: 5.0 (for the implementation of TensorRT) Custom Model Input Dimensions: [40, 1, 1280] I have removed the cv::cuda::split(batchInput[img], input_channels); line from engine.h because my model has 40 channels. By doing this, I simulate having multiple batches:

    for (int i = 0; i < batchSize; ++i) {
        batchInput.push_back(gpuImg.clone());
    }

model.zip

image

thomaskleiven commented 2 weeks ago

Can you confirm that it works as expected if you export the ONNX model with a fixed batch_size of 2?

pavelgrigoriev commented 2 weeks ago

Can you confirm that it works as expected if you export the ONNX model with a fixed batch_size of 2?

Now the model inference is happening, but there is an error calling by the following line in engine.cpp:

if (input.size() != 1 || input[0].size() != 1)

The program terminates with the following error message: terminate called after throwing an instance of 'std::logic_error' what(): The feature vector has incorrect dimensions!

This occurs when applying the transformOutput function.

`std::vector HypNet::detectObjects(std::string lineFilePath) { int batchSize = 2; auto input = preprocess(lineFilePath); std::vector<std::vector<std::vector>> featureVectors; double totalInferenceTimeMs = 0.0;

// Simulate running inference on x lines
for (int i = 0; i < 1; ++i) {
    // Start measuring time
    auto start = std::chrono::high_resolution_clock::now();

    auto succ = m_trtEngine->runInference(input, featureVectors);

    // Stop measuring time
    auto end = std::chrono::high_resolution_clock::now();

    // Calculate the duration for this inference
    std::chrono::duration<double> duration = end - start;
    double inferenceTimeMs = duration.count() * 1000.0; 
    totalInferenceTimeMs += inferenceTimeMs;

    if (!succ) {
        throw std::runtime_error("Error: Unable to run inference on line " + std::to_string(i + 1));
    }
}
qDebug() << "Total inference time: " << totalInferenceTimeMs << " ms";

auto outputDims = m_trtEngine->getOutputDims();

qDebug() << "Output dimensions: ";
for (const auto& dim : outputDims) {
    qDebug() << dim.d[0] << "x" << dim.d[1] << "x" << dim.d[2] << "x" << dim.d[3];
}

if (outputDims.size() != 1 || outputDims[0].d[0] != batchSize || outputDims[0].d[2] != 1 || outputDims[0].d[3] != 1280) {
    throw std::runtime_error("Error: Unexpected output dimensions.");
}

std::vector<float> featureVector;
Engine<float>::transformOutput(featureVectors, featureVector);

std::vector<std::vector<int>> all_predicted_classes(batchSize, std::vector<int>(1280));

for (int b = 0; b < batchSize; ++b) {
    for (int i = 0; i < 1280; ++i) {
        float max_val = featureVectors[b][0][i]; // Start with the first class
        int max_idx = 0;
        for (int j = 1; j < 6; ++j) { // Iterate over the 6 classes
            if (featureVectors[b][0][j * 1280 + i] > max_val) {
                max_val = featureVectors[b][0][j * 1280 + i];
                max_idx = j;
            }
        }
        all_predicted_classes[b][i] = max_idx;
    }
}
std::vector<Object> detectedObjects;
return detectedObjects;

}`

thomaskleiven commented 2 weeks ago

It seems that the issue might be related to how the input tensor shapes are set during inference. By modifying the lines around L72 in the following way:

m_context->setInputShape(m_IOTensorNames[i].c_str(),
                                 inputDims);

to

int inputIndex = m_engine->getBindingIndex(m_IOTensorNames[i].c_str());

if (!m_engine->bindingIsInput(inputIndex)) {
       spdlog::error("Binding {} is not an input!", inputIndex);
       return false;
}

// Set the binding dimensions for the input
if (!m_context->setBindingDimensions(inputIndex, inputDims)) {
      spdlog::error("Failed to set binding dimensions for input {}", inputIndex);
      return false;
}

I get featureVectors of shape 2x1x7680, which can be reshaped to 2x6x1280. I believe this is the output shape you’re expecting.

I'm not entirely sure yet why this change works differently from the previous implementation, but I’ll look into it further. Could you please try this on your end and see if it resolves the issue?

pavelgrigoriev commented 2 weeks ago

I tried make like this

        nvinfer1::Dims4 inputDims = {batchSize, dims.d[0], dims.d[1], dims.d[2]};
        m_context->setInputShape(m_IOTensorNames[i].c_str(),
                                 inputDims); // Define the batch size
        int inputIndex = m_engine->getBindingIndex(m_IOTensorNames[i].c_str());

        if (!m_engine->bindingIsInput(inputIndex)) {
            std::cout << "Binding {} is not an input!", inputIndex;
            return false;
        }

        // Set the binding dimensions for the input
        if (!m_context->setBindingDimensions(inputIndex, inputDims)) {
            std::cout << "Failed to set binding dimensions for input {}", inputIndex;
            return false;
        }

or

        nvinfer1::Dims4 inputDims = {batchSize, dims.d[0], dims.d[1], dims.d[2]};
        int inputIndex = m_engine->getBindingIndex(m_IOTensorNames[i].c_str());

        if (!m_engine->bindingIsInput(inputIndex)) {
            std::cout << "Binding {} is not an input!", inputIndex;
            return false;
        }

        // Set the binding dimensions for the input
        if (!m_context->setBindingDimensions(inputIndex, inputDims)) {
            std::cout << "Failed to set binding dimensions for input {}", inputIndex;
            return false;
        }
        m_context->setInputShape(m_IOTensorNames[i].c_str(),
                                 inputDims); // Define the batch size

but no effect for me

thomaskleiven commented 2 weeks ago

The code on the i-80 branch includes these changes, resulting in the following output:

./build/run_inference_benchmark --onnx_model ./models/model.onnx 
[2024-08-26 11:39:34.462] [warning] LOG_LEVEL environment variable not set. Using default log level (info).
[2024-08-26 11:39:34.478] [info] Engine name: model.engine.Orin.fp16.2.2
[2024-08-26 11:39:34.478] [info] Searching for engine file with name: ./model.engine.Orin.fp16.2.2
[2024-08-26 11:39:34.478] [info] Engine found, not regenerating...
[2024-08-26 11:39:34.478] [info] Loading TensorRT engine file at path: ./model.engine.Orin.fp16.2.2
[2024-08-26 11:39:34.556] [info] Loaded engine size: 11 MiB
[2024-08-26 11:39:34.583] [info] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +10, now: CPU 0, GPU 10 (MiB)
[2024-08-26 11:39:34.584] [info] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +3, now: CPU 0, GPU 13 (MiB)
[2024-08-26 11:39:34.594] [info] Attempting to reshape the matrix to have 40 channels, 1 height, and 1280 width
[2024-08-26 11:39:34.594] [info] Reshaped desired matrix to have 40 channels, 1 height, and 1280 width
[2024-08-26 11:39:34.594] [info] Warming up the network...
[2024-08-26 11:39:34.870] [info] Feature vectors shape: 2x1x7680
[2024-08-26 11:39:34.870] [info] Running benchmarks (1000 iterations)...
[2024-08-26 11:39:36.693] [info] Benchmarking complete!
[2024-08-26 11:39:36.693] [info] ======================
[2024-08-26 11:39:36.693] [info] Avg time per sample: 
[2024-08-26 11:39:36.693] [info] Avg time per sample: 0.9115 ms
[2024-08-26 11:39:36.693] [info] Batch size: 2
[2024-08-26 11:39:36.693] [info] ======================

[2024-08-26 11:39:36.693] [info] Batch 0, output 0
[2024-08-26 11:39:36.693] [info] 4.476562 4.285156 4.230469 4.371094 4.699219 4.828125 4.921875 4.167969 4.406250 4.726562 ...
[2024-08-26 11:39:36.693] [info] Batch 1, output 0
[2024-08-26 11:39:36.693] [info] 4.476562 4.285156 4.230469 4.371094 4.699219 4.828125 4.921875 4.167969 4.406250 4.726562 ...

Does this give the results you're expecting?

pavelgrigoriev commented 2 weeks ago

No result that i expected(

Output dimensions: # 40 x 6 x 1 x 1280 terminate called after throwing an instance of 'std::logic_error' what(): The feature vector has incorrect dimensions!

thomaskleiven commented 2 weeks ago

Didn't you mention that the output should be (batch_size, 6, 1, 1280)? In the case on branch i-80 the batch_size is set to 2.

pavelgrigoriev commented 2 weeks ago

I just tried different barchsize 2, 20,40, and convert onnx to 2, 20, 40

thomaskleiven commented 2 weeks ago

In the example above I ran it with your original model with dynamic batch size.

pavelgrigoriev commented 2 weeks ago

Total inference time: 26.1466 ms Output dimensions: -1 x 6 x 1 x 1280 terminate called after throwing an instance of 'std::logic_error' what(): The feature vector has incorrect dimensions!

thomaskleiven commented 2 weeks ago

I’m not sure why it’s not working with your structure. I’ll keep the i-80 branch active for now in case you want to use it for debugging. It’s set up with dynamic batch size for the model you provided.

pavelgrigoriev commented 2 weeks ago

I’m not sure why it’s not working with your structure. I’ll keep the i-80 branch active for now in case you want to use it for debugging. It’s set up with dynamic batch size for the model you provided.

Okey, thank you very much anyway!