Closed VitalyVaryvdin closed 1 day ago
Hi @VitalyVaryvdin!
I'd try debugging and checking/comparing with python process to see where the problem comes from. Does the final output is consistent with fast-plate-ocr
py output?
I'm not familiar with CV-CUDA
, but when run the following C++ code using onnxruntime
it works well:
#include <iostream>
#include <vector>
#include <opencv2/opencv.hpp>
#include <onnxruntime/onnxruntime_cxx_api.h>
cv::Mat preprocess_image(const cv::Mat &input_image, int img_height, int img_width) {
cv::Mat gray_image, resized_image, final_image;
// convert to grayscale
cv::cvtColor(input_image, gray_image, cv::COLOR_BGR2GRAY);
// resize image
cv::resize(gray_image, resized_image, cv::Size(img_width, img_height));
// uint8 format
resized_image.convertTo(final_image, CV_8U);
// add batch dimension and channel dim
final_image = final_image.reshape(1, {1, img_height, img_width, 1});
return final_image;
}
// postprocess model output
std::string postprocess_output(const std::vector<float> &output, int max_plate_slots, const std::string &alphabet) {
auto alphabet_len = alphabet.size();
std::string plate;
for (int i = 0; i < max_plate_slots; ++i) {
float max_val = -std::numeric_limits<float>::infinity();
int max_idx = 0;
for (int j = 0; j < alphabet_len; ++j) {
if (output[i * alphabet_len + j] > max_val) {
max_val = output[i * alphabet_len + j];
max_idx = j;
}
}
plate += alphabet[max_idx];
}
return plate;
}
int main(int argc, char *argv[]) {
const std::string model_path = "./assets/arg_cnn_ocr_synth.onnx";
const std::string image_path = "./assets/test_plate_1.png";
const std::string alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_";
const int max_plate_slots = 7;
const int img_height = 70;
const int img_width = 140;
// init ONNX Runtime
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "test");
Ort::SessionOptions session_options;
Ort::Session session(env, model_path.c_str(), session_options);
// read and preprocess image
cv::Mat input_image = cv::imread(image_path);
if (input_image.empty()) {
std::cerr << "Failed to read image: " << image_path << std::endl;
return 1;
}
cv::Mat processed_image = preprocess_image(input_image, img_height, img_width);
// create input tensor
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
std::vector<int64_t> input_shape = {1, img_height, img_width, 1};
std::vector<uint8_t> input_tensor_values(processed_image.begin<uint8_t>(), processed_image.end<uint8_t>());
Ort::Value input_tensor = Ort::Value::CreateTensor<uint8_t>(memory_info, input_tensor_values.data(),
input_tensor_values.size(), input_shape.data(),
input_shape.size());
// define input and output nodes
const char *input_node_names[] = {"input"};
const char *output_node_names[] = {"concatenate"};
// run model
auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_node_names, &input_tensor, 1, output_node_names,
1);
std::vector<float> output_tensor_values(output_tensors.front().GetTensorMutableData<float>(),
output_tensors.front().GetTensorMutableData<float>() +
max_plate_slots * alphabet.size());
// postprocess output
std::string plate = postprocess_output(output_tensor_values, max_plate_slots, alphabet);
std::cout << "Recognized plate: " << plate << std::endl;
return 0;
}
The plate AD799KB
is predicted correctly and it matches very closely to the python output.
Can you please check what's your input tensor memory size in the code above?
upd:
Tried the code you shared, input tensor memory size is 9800 in my case as it should be (1x70x140x1) Code using CVCUDA reports tensor stride0 (buffer size) to be 11200, and stride1 (row stride) to be 160. And 160x70 gives exactly 11200
Reading raw buffer as L8 image: Row stride 140 gives messed up image Row stride 160 gives license plate image
Seems like my image is actually stored as 160x70 in memory, thus giving wrong results
Kinda solved the issue, but more of a hacky way
nvcv::TensorDataStridedCuda::Buffer inBuf;
inBuf.strides[3] = sizeof(uint8_t);
inBuf.strides[2] = 1 * inBuf.strides[3];
inBuf.strides[1] = 140 * inBuf.strides[2];
inBuf.strides[0] = 70 * inBuf.strides[1];
CHECK_CUDA_ERROR(cudaMallocAsync(&inBuf.basePtr, 1 * inBuf.strides[0], stream));
nvcv::Tensor::Requirements inReqs = nvcv::Tensor::CalcRequirements(1, {140, 70}, nvcv::FMT_U8);
nvcv::TensorDataStridedCuda inData(nvcv::TensorShape{inReqs.shape, inReqs.rank, inReqs.layout}, nvcv::DataType{inReqs.dtype}, inBuf);
nvcv::Tensor resizeTensor = TensorWrapData(inData);
Allocating tensor manually with specified stride sizes did solve the issue, but I'm not sure why CVCUDA did allocated wrong strides in the first place.
Thanks for the help, your ONNX snippet helped me a lot!
Glad you fixed it!
Hi, the issue is probably out of scope of the repo but I've been struggling to solve the issue I've been getting for a whole day now, thought you might give me a hint whether I do something wrong pre & post processing model data
What I've got in a nutshell: C++, TensorRT, CVCUDA Model: 9 slots, 24 char alphabet Model input: 1, 70, 140, 1 Model output: 1, 216
YOLO8 does license plate detection, then I crop license plate and pass to fast-plate-ocr
Then I do brute-force post-processing like this:
However, what I get is completely different to what I get compared to Python inference via either fast-place-ocr inference or manual inference and post-processing pretty much copied from your script
Recognized plate is completely wrong.
Here's example of image saved from inputLayerTensor (pre-processed image passed to fast-plate-ocr inference)
Sample output for the image:
T351CT15_
. This is also hugely unstable, jumping from one prediction to another all the timeMy best guess here is that strange model input dims: 1, 70, 140, 1, which means interleaved data expected since channel is present. My tensor created is of NHWC shape, however data is purely planar (FMT_U8 data type)
Would love to hear back if you have any ideas, thanks!