NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.61k stars 2.11k forks source link

OpenCV input image to TensorRT engine. Example or Tutorial?? #348

Closed isra60 closed 4 years ago

isra60 commented 4 years ago

Is there any example of how to use TensorRT with an OpenCV Mat??

I know there is an order problem:

TensorRT requires your image data to be in NCHW order. But OpenCV reads this in the NHWC order.

So I think the sampleFasterRCNN.cpp adress this issue.

But I think a example or tutorial will be better as OpenCV is such a popular library.

ShawnNew commented 4 years ago

You can use TensorRT Python API, and do the order conversion in Python is easy.

rmccorm4 commented 4 years ago

If you're looking to do it in C++, I believe there's an example or converting to NCHW here (they just also happen to be subtracting the mean at the same time): https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleFasterRCNN/sampleFasterRCNN.cpp#L281-L293

Source: https://stackoverflow.com/a/45858916/10993413

Scottchou7 commented 4 years ago

Hi, I am new to C++ and TensorRT. I tried to modify "sampleOnnxMNIST.cpp" for loading a MobileNetV1 to classify my own data. However the output value was wrong. Below is my code. Can anyone help? (BTW: I use keras2onnx convert .h5 to .onnx)

OS: Win10 (Visual Studio 2017) GPU: 2080 Ti TensorRT version: 7.1.3.4 CUDA version: 11.0 Cudnn version: 8.0.2 onnx version: 1.6.0

bool SampleOnnxMNIST::processInput(const samplesCommon::BufferManager& buffers)
{
        const int inputC = mInputDims.d[1];
    const int inputH = mInputDims.d[2];
    const int inputW = mInputDims.d[3];
    const int batchSize = mParams.batchSize;

    cv::Mat origin_image = cv::imread("D:/TensorRT/TensorRT-7.1.3.4/data/test1.jpg", 1);
    if (!origin_image.data)
    {
        cerr << "Error : could not load image." << endl;
        return false;
    }
    cv::Mat resize_image;
    cv::resize(origin_image, resize_image, cv::Size(inputH, inputW), cv::INTER_CUBIC);

    // Fill data buffer
    float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
    for (int i = 0, volImg = inputC * inputH * inputW; i < batchSize; ++i)
    {
        for (int c = 0; c < inputC; ++c)
        {
            for (unsigned j = 0, volChl = inputH * inputW; j < volChl; ++j)
                hostDataBuffer[i * volImg + c * volChl + j] = 1.0 - float(resize_image.data[j * inputC + 2 - c]/255.);
        }
    }
    return true;
}
aunsid commented 3 years ago

if anyone is looking for this

bool SampleUffSSD::processInput(const samplesCommon::BufferManager& buffers)
     const int batchSize = mParams.batchSize;

     // Available images

    std::vector<std::string> imageList = {"test.jpeg"};
     mPPMs.resize(batchSize);
     assert(mPPMs.size() <= imageList.size());
     for (int i = 0; i < batchSize; ++i)
     {

        readImage(locateFile(imageList[i], mParams.dataDirs), image);
     }

     float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
     // Host memory for input buffer

    for (int i = 0, volImg = inputH * inputW; i < mParams.batchSize; ++i)
     {

        for (unsigned j = 0, volChl = inputH * inputW; j < inputH; ++j)
         {

                   for( unsigned k = 0; k < inputW; ++ k)
                       {
                cv::Vec3b bgr = image.at<cv::Vec3b>(j,k);
                hostDataBuffer[i * volImg + 0 * volChl + j * inputW + k] = (2.0 / 255.0) * float(bgr[2]) - 1.0;
                hostDataBuffer[i * volImg + 1 * volChl + j * inputW + k] = (2.0 / 255.0) * float(bgr[1]) - 1.0;
                hostDataBuffer[i * volImg + 2 * volChl + j * inputW + k] = (2.0 / 255.0) * float(bgr[0]) - 1.0;
             }
         }

source: https://forums.developer.nvidia.com/t/custom-trained-ssd-inception-model-in-tensorrt-c-version/143048/14

r2d3 commented 1 year ago

Hi all, I wrote a little blog article with some CUDA kernels to do this on the GPU.

https://www.dotndash.net/2023/03/09/using-tensorrt-with-opencv-cuda.html#use-tensorrt-c-api-with-opencv

Sometime, your preprocessing pipeline is already on the GPU and you do not want to copy the data back to the CPU.

Perhaps it could be an idea to add this kind of HWC to/from NCHW functionalities to OpenCV DNN

kostastsing commented 5 months ago

Hi all, I wrote a little blog article with some CUDA kernels to do this on the GPU.

https://www.dotndash.net/2023/03/09/using-tensorrt-with-opencv-cuda.html#use-tensorrt-c-api-with-opencv

Sometime, your preprocessing pipeline is already on the GPU and you do not want to copy the data back to the CPU.

Perhaps it could be an idea to add this kind of HWC to/from NCHW functionalities to OpenCV DNN

Inside the function toNCHWKernel what does blockIdx refer to? Can you please provide the full code snippet?

r2d3 commented 5 months ago

Hello @kostastsing

In fact I wrote this kernel but you could get a similar result using split/merge channels. And the timings are quite similar on recent GPUs.

Regards

David