Open bbhxwl opened 3 months ago
Can you share the result from C#/python use case?
The different result might caused by various reason: difference in image processing/inference-parameter.
Can you share the result from C#/python use case?
The different result might caused by various reason: difference in image processing/inference-parameter.
I will rewrite a demo tonight, but have you been successful with YOLO5?
Can you share the result from C#/python use case?
The different result might caused by various reason: difference in image processing/inference-parameter.
python
import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5x', pretrained=True)
model.eval()
rs = model('/Users/xuzhibin/Downloads/6ee927a0d4f2c9862a918798de175f5.jpg')
rs.print()
detections = rs.xyxy[0]
for *box, conf, cls in detections:
print(f"Detected {model.names[int(cls)]} with confidence {conf:.2f} at [{box[0]:.2f}, {box[1]:.2f}, {box[2]:.2f}, {box[3]:.2f}]")
result
Detected person with confidence 0.74 at [63.35, 0.31, 253.65, 152.87]
Detected cup with confidence 0.70 at [0.03, 141.83, 30.46, 180.19]
Detected person with confidence 0.35 at [1.25, 9.21, 252.73, 337.44]
Can you share the result from C#/python use case?
The different result might caused by various reason: difference in image processing/inference-parameter.
C#
// See https://aka.ms/new-console-template for more information
using System.Drawing;
using ConsoleApp1;
using Microsoft.ML.OnnxRuntime;
InferenceSession session = new InferenceSession("/Users/xuzhibin/Downloads/yolov5x.onnx");
List<NamedOnnxValue> inputs = new List<NamedOnnxValue>();
Stream stream = new FileStream("/Users/xuzhibin/Downloads/6ee927a0d4f2c9862a918798de175f5.jpg", FileMode.Open);
inputs.Add(NamedOnnxValue.CreateFromTensor<float>("images",Test.PreprocessImage(stream)));
var results = session.Run(inputs);
var output=results.First().AsTensor<float>();
var boxes = new List<float[]>();
for (int i = 0; i < output.Dimensions[1]; i++)
{
var boxData = new float[85];
for (int j = 0; j < 85; j++)
{
boxData[j] = output[0, i, j];
}
boxes.Add(boxData);
}
var m=boxes.Max(s => s[4]);
List<float> ll = new List<float>();
foreach (var box in boxes)
{
float confidence = box[4];
ll.Add(confidence);
}
var sasd=ll.Max();
Console.WriteLine();
using Microsoft.ML.OnnxRuntime.Tensors;
using SkiaSharp;
namespace ConsoleApp1;
public class Test
{
public static Tensor<float> PreprocessImage(Stream stream)
{
int targetWidth = 640; // YOLOv5的输入大小通常是640x640
int targetHeight = 640;
// 使用SkiaSharp进行图像处理
using (SKBitmap skBitmap = SKBitmap.Decode(stream))
using (SKBitmap resizedBitmap = skBitmap.Resize(new SKImageInfo(targetWidth, targetHeight), SKFilterQuality.High))
{
// 将图片像素转换为浮点数数组
float[] imageData = new float[targetWidth * targetHeight * 3]; // 3是因为RGB三通道
int index = 0;
for (int y = 0; y < resizedBitmap.Height; y++)
{
for (int x = 0; x < resizedBitmap.Width; x++)
{
SKColor pixel = resizedBitmap.GetPixel(x, y);
// 将像素值归一化到0-1之间
imageData[index++] = pixel.Red / 255.0f;
imageData[index++] = pixel.Green / 255.0f;
imageData[index++] = pixel.Blue / 255.0f;
}
}
// 将数据转换为Tensor<float>
var dimensions = new[] { 1, 3, targetHeight, targetWidth }; // batch size 为 1
return new DenseTensor<float>(imageData, dimensions);
}
}
}
The returned data is completely different. Very strange.
Can you share the result from C#/python use case?
The different result might caused by various reason: difference in image processing/inference-parameter.
hello
You're writing the data in channels last format [width,height,channels]
but YOLO5 wants channels first [channels,width,height]
. So your image is corrupted when YOLO5 sees it. You need to change how you write the image data into the tensor in C#.
You're writing the data in channels last format
[width,height,channels]
but YOLO5 wants channels first[channels,width,height]
. So your image is corrupted when YOLO5 sees it. You need to change how you write the image data into the tensor in C#.
var dimensions = new[] { 1, 3, targetHeight, targetWidth };
Are you referring to this code? How should I modify it?
update var dimensions = new[] { 1, 3, targetWidth,targetHeight };
You're writing the data in channels last format
[width,height,channels]
but YOLO5 wants channels first[channels,width,height]
. So your image is corrupted when YOLO5 sees it. You need to change how you write the image data into the tensor in C#.
I don't understand where you're referring to my code? What went wrong?
You're writing out the elements with the channels in the last dimension, but you then construct the tensor telling it the channels are the first dimension. It can't do the reshape for you because it doesn't know you wrote the data out in the wrong order.
You should modify your for loop to have three loops, the first is over channels, then height then width. Write out a single colour in the inner most loop and you'll get the right data layout. There's probably an easier way to do it, but I'm not familiar with the tooling in C#.
channel-first encoding: [B, C, W, H] (in RGB order)
inputTensor = [img[i,j].R for i, j in img] + [img[i,j].G for i, j in img] + [img[i, j].B for i,j in img]
channel-last encoding: [B, W, H, C] (in RGB order)
inputTensor = [img[i,j].R, img[i, j].G, img[i, j].B for i, j in img]
You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]
Yeah that's probably why you didn't get reasonable result from yolo model in C#. I would also check if the input requirement for yoloV5 is in RGB order as well.
channel-first encoding: [B, C, W, H] (in RGB order)
inputTensor = [img[i,j].R for i, j in img] + [img[i,j].G for i, j in img] + [img[i, j].B for i,j in img]
channel-last encoding: [B, W, H, C] (in RGB order)
inputTensor = [img[i,j].R, img[i, j].G, img[i, j].B for i, j in img]
You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]
Yeah that's probably why you didn't get reasonable result from yolo model in C#. I would also check if the input requirement for yoloV5 is in RGB order as well.
The following seems to work, but is there a simpler method in C #? There should be a written method, right?
public static Tensor<float> PreprocessImage(Stream stream)
{
int targetWidth = 640; // YOLOv5的输入大小通常是640x640
int targetHeight = 640;
// 使用SkiaSharp进行图像处理
using (SKBitmap skBitmap = SKBitmap.Decode(stream))
using (SKBitmap resizedBitmap = skBitmap.Resize(new SKImageInfo(targetWidth, targetHeight), SKFilterQuality.High))
{
// 将图片像素转换为浮点数数组,存储为 [channels, width, height]
float[] imageData = new float[3 * targetWidth * targetHeight]; // 3是因为RGB三通道
int indexR = 0;
int indexG = targetWidth * targetHeight;
int indexB = 2 * targetWidth * targetHeight;
for (int y = 0; y < resizedBitmap.Height; y++)
{
for (int x = 0; x < resizedBitmap.Width; x++)
{
SKColor pixel = resizedBitmap.GetPixel(x, y);
// 将像素值归一化到0-1之间
imageData[indexR++] = pixel.Red / 255.0f;
imageData[indexG++] = pixel.Green / 255.0f;
imageData[indexB++] = pixel.Blue / 255.0f;
}
}
// 将数据转换为Tensor<float>
var dimensions = new[] { 1, 3, targetHeight, targetWidth }; // batch size 为 1, 通道在前
return new DenseTensor<float>(imageData, dimensions);
}
}
Maybe take a look at this api? https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.extractpixels?view=ml-dotnet
Why do the following models use the same method but have different results? Is it a C # issue?
How can C # achieve code with the same effect as Python?