Closed isra60 closed 4 years ago
You can use TensorRT Python API, and do the order conversion in Python is easy.
If you're looking to do it in C++, I believe there's an example or converting to NCHW here (they just also happen to be subtracting the mean at the same time): https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleFasterRCNN/sampleFasterRCNN.cpp#L281-L293
Hi, I am new to C++ and TensorRT. I tried to modify "sampleOnnxMNIST.cpp" for loading a MobileNetV1 to classify my own data. However the output value was wrong. Below is my code. Can anyone help? (BTW: I use keras2onnx convert .h5 to .onnx)
OS: Win10 (Visual Studio 2017) GPU: 2080 Ti TensorRT version: 7.1.3.4 CUDA version: 11.0 Cudnn version: 8.0.2 onnx version: 1.6.0
bool SampleOnnxMNIST::processInput(const samplesCommon::BufferManager& buffers)
{
const int inputC = mInputDims.d[1];
const int inputH = mInputDims.d[2];
const int inputW = mInputDims.d[3];
const int batchSize = mParams.batchSize;
cv::Mat origin_image = cv::imread("D:/TensorRT/TensorRT-7.1.3.4/data/test1.jpg", 1);
if (!origin_image.data)
{
cerr << "Error : could not load image." << endl;
return false;
}
cv::Mat resize_image;
cv::resize(origin_image, resize_image, cv::Size(inputH, inputW), cv::INTER_CUBIC);
// Fill data buffer
float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
for (int i = 0, volImg = inputC * inputH * inputW; i < batchSize; ++i)
{
for (int c = 0; c < inputC; ++c)
{
for (unsigned j = 0, volChl = inputH * inputW; j < volChl; ++j)
hostDataBuffer[i * volImg + c * volChl + j] = 1.0 - float(resize_image.data[j * inputC + 2 - c]/255.);
}
}
return true;
}
if anyone is looking for this
bool SampleUffSSD::processInput(const samplesCommon::BufferManager& buffers)
const int batchSize = mParams.batchSize;
// Available images
std::vector<std::string> imageList = {"test.jpeg"};
mPPMs.resize(batchSize);
assert(mPPMs.size() <= imageList.size());
for (int i = 0; i < batchSize; ++i)
{
readImage(locateFile(imageList[i], mParams.dataDirs), image);
}
float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
// Host memory for input buffer
for (int i = 0, volImg = inputH * inputW; i < mParams.batchSize; ++i)
{
for (unsigned j = 0, volChl = inputH * inputW; j < inputH; ++j)
{
for( unsigned k = 0; k < inputW; ++ k)
{
cv::Vec3b bgr = image.at<cv::Vec3b>(j,k);
hostDataBuffer[i * volImg + 0 * volChl + j * inputW + k] = (2.0 / 255.0) * float(bgr[2]) - 1.0;
hostDataBuffer[i * volImg + 1 * volChl + j * inputW + k] = (2.0 / 255.0) * float(bgr[1]) - 1.0;
hostDataBuffer[i * volImg + 2 * volChl + j * inputW + k] = (2.0 / 255.0) * float(bgr[0]) - 1.0;
}
}
Hi all, I wrote a little blog article with some CUDA kernels to do this on the GPU.
Sometime, your preprocessing pipeline is already on the GPU and you do not want to copy the data back to the CPU.
Perhaps it could be an idea to add this kind of HWC to/from NCHW functionalities to OpenCV DNN
Hi all, I wrote a little blog article with some CUDA kernels to do this on the GPU.
Sometime, your preprocessing pipeline is already on the GPU and you do not want to copy the data back to the CPU.
Perhaps it could be an idea to add this kind of HWC to/from NCHW functionalities to OpenCV DNN
Inside the function toNCHWKernel
what does blockIdx
refer to? Can you please provide the full code snippet?
Hello @kostastsing
In fact I wrote this kernel but you could get a similar result using split/merge channels. And the timings are quite similar on recent GPUs.
Regards
David
Is there any example of how to use TensorRT with an OpenCV Mat??
I know there is an order problem:
TensorRT requires your image data to be in NCHW order. But OpenCV reads this in the NHWC order.
So I think the sampleFasterRCNN.cpp adress this issue.
But I think a example or tutorial will be better as OpenCV is such a popular library.