dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.88k forks source link

Yolov5 Why do the following models use the same method but have different results? Is it a C # issue? #7215

Open bbhxwl opened 3 months ago

bbhxwl commented 3 months ago

Why do the following models use the same method but have different results? Is it a C # issue?

How can C # achieve code with the same effect as Python?

// See https://aka.ms/new-console-template for more information

using System.Drawing;
using ConsoleApp3;
using Microsoft.ML.OnnxRuntime;

InferenceSession session = new InferenceSession("D:\\yolov5\\yolov5x.onnx");
List<NamedOnnxValue> inputs = new List<NamedOnnxValue>();
var bit=(Bitmap)Bitmap.FromFile("C:\\Users\\47013\\Desktop\\2.jpeg");
inputs.Add(NamedOnnxValue.CreateFromTensor<float>("images",test.PreprocessImage(bit)));

var results = session.Run(inputs);
var output=results.First().AsTensor<float>();
var boxes = new List<float[]>();

for (int i = 0; i < output.Dimensions[1]; i++)
{
    var boxData = new float[85];
    for (int j = 0; j < 85; j++)
    {
        boxData[j] = output[0, i, j];
    }
    boxes.Add(boxData);
}

var asd=test.DrawBoundingBoxes(bit,boxes,0.1f);
asd.Save("test.jpg");
var m=boxes.Max(s => s[4]);
List<float> ll = new List<float>();
foreach (var box in boxes)
{
    float confidence = box[4];
    ll.Add(confidence);
}
var sasd=ll.Max();
Console.WriteLine();
import datetime

import torch

# 加载模型
model = torch.hub.load('ultralytics/yolov5', 'yolov5x', pretrained=True)

# 设置模型为评估模式
model.eval()

# 图像文件路径
img_path = 'C:\\Users\\47013\\Desktop\\新建文件夹\\2.jpeg'  # 替换为你自己的图像路径
start_time = datetime.datetime.now()
# 进行推理
results = model(img_path)
print(datetime.datetime.now() - start_time)
# 解析结果
results.print()  # 打印检测结果
results.show()   # 显示带检测框的图像

# 访问结果数据
detections = results.xyxy[0]  # 访问检测结果,格式为 (x1, y1, x2, y2, confidence, class)

# 打印检测框的详细信息
for *box, conf, cls in detections:
    print(f"Detected {model.names[int(cls)]} with confidence {conf:.2f} at [{box[0]:.2f}, {box[1]:.2f}, {box[2]:.2f}, {box[3]:.2f}]")
LittleLittleCloud commented 3 months ago

Can you share the result from C#/python use case?

The different result might caused by various reason: difference in image processing/inference-parameter.

bbhxwl commented 3 months ago

Can you share the result from C#/python use case?

The different result might caused by various reason: difference in image processing/inference-parameter.

I will rewrite a demo tonight, but have you been successful with YOLO5?

bbhxwl commented 3 months ago

Can you share the result from C#/python use case?

The different result might caused by various reason: difference in image processing/inference-parameter.

python

import torch

model = torch.hub.load('ultralytics/yolov5', 'yolov5x', pretrained=True)
model.eval()
rs = model('/Users/xuzhibin/Downloads/6ee927a0d4f2c9862a918798de175f5.jpg')
rs.print()
detections = rs.xyxy[0]
for *box, conf, cls in detections:
    print(f"Detected {model.names[int(cls)]} with confidence {conf:.2f} at [{box[0]:.2f}, {box[1]:.2f}, {box[2]:.2f}, {box[3]:.2f}]")

result

Detected person with confidence 0.74 at [63.35, 0.31, 253.65, 152.87]
Detected cup with confidence 0.70 at [0.03, 141.83, 30.46, 180.19]
Detected person with confidence 0.35 at [1.25, 9.21, 252.73, 337.44]
image
bbhxwl commented 3 months ago

Can you share the result from C#/python use case?

The different result might caused by various reason: difference in image processing/inference-parameter.

C#

// See https://aka.ms/new-console-template for more information

using System.Drawing;
using ConsoleApp1;
using Microsoft.ML.OnnxRuntime;

InferenceSession session = new InferenceSession("/Users/xuzhibin/Downloads/yolov5x.onnx");
List<NamedOnnxValue> inputs = new List<NamedOnnxValue>();
Stream stream = new FileStream("/Users/xuzhibin/Downloads/6ee927a0d4f2c9862a918798de175f5.jpg", FileMode.Open);

inputs.Add(NamedOnnxValue.CreateFromTensor<float>("images",Test.PreprocessImage(stream)));

var results = session.Run(inputs);
var output=results.First().AsTensor<float>();
var boxes = new List<float[]>();

for (int i = 0; i < output.Dimensions[1]; i++)
{
    var boxData = new float[85];
    for (int j = 0; j < 85; j++)
    {
        boxData[j] = output[0, i, j];
    }
    boxes.Add(boxData);
}

var m=boxes.Max(s => s[4]);
List<float> ll = new List<float>();
foreach (var box in boxes)
{
    float confidence = box[4];
    ll.Add(confidence);
}
var sasd=ll.Max();
Console.WriteLine();
using Microsoft.ML.OnnxRuntime.Tensors;
using SkiaSharp;

namespace ConsoleApp1;

public class Test
{
    public static Tensor<float> PreprocessImage(Stream stream)
    {
        int targetWidth = 640; // YOLOv5的输入大小通常是640x640
        int targetHeight = 640;
        // 使用SkiaSharp进行图像处理
        using (SKBitmap skBitmap = SKBitmap.Decode(stream))
        using (SKBitmap resizedBitmap = skBitmap.Resize(new SKImageInfo(targetWidth, targetHeight), SKFilterQuality.High))
        {
            // 将图片像素转换为浮点数数组
            float[] imageData = new float[targetWidth * targetHeight * 3]; // 3是因为RGB三通道
            int index = 0;

            for (int y = 0; y < resizedBitmap.Height; y++)
            {
                for (int x = 0; x < resizedBitmap.Width; x++)
                {
                    SKColor pixel = resizedBitmap.GetPixel(x, y);
                    // 将像素值归一化到0-1之间
                    imageData[index++] = pixel.Red / 255.0f;
                    imageData[index++] = pixel.Green / 255.0f;
                    imageData[index++] = pixel.Blue / 255.0f;
                }
            }

            // 将数据转换为Tensor<float>
            var dimensions = new[] { 1, 3, targetHeight, targetWidth }; // batch size 为 1
            return new DenseTensor<float>(imageData, dimensions);
        }
    }
}
image image image

The returned data is completely different. Very strange.

bbhxwl commented 3 months ago

Can you share the result from C#/python use case?

The different result might caused by various reason: difference in image processing/inference-parameter.

hello

Craigacp commented 3 months ago

You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]. So your image is corrupted when YOLO5 sees it. You need to change how you write the image data into the tensor in C#.

bbhxwl commented 3 months ago

You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]. So your image is corrupted when YOLO5 sees it. You need to change how you write the image data into the tensor in C#.

var dimensions = new[] { 1, 3, targetHeight, targetWidth };

Are you referring to this code? How should I modify it?

update var dimensions = new[] { 1, 3, targetWidth,targetHeight };

bbhxwl commented 3 months ago

You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]. So your image is corrupted when YOLO5 sees it. You need to change how you write the image data into the tensor in C#.

I don't understand where you're referring to my code? What went wrong?

Craigacp commented 3 months ago

You're writing out the elements with the channels in the last dimension, but you then construct the tensor telling it the channels are the first dimension. It can't do the reshape for you because it doesn't know you wrote the data out in the wrong order.

You should modify your for loop to have three loops, the first is over channels, then height then width. Write out a single colour in the inner most loop and you'll get the right data layout. There's probably an easier way to do it, but I'm not familiar with the tooling in C#.

LittleLittleCloud commented 3 months ago

channel-first encoding: [B, C, W, H] (in RGB order)

inputTensor = [img[i,j].R for i, j in img] + [img[i,j].G for i, j in img] + [img[i, j].B for i,j in img]

channel-last encoding: [B, W, H, C] (in RGB order)

inputTensor = [img[i,j].R, img[i, j].G, img[i, j].B for i, j in img]

You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]

Yeah that's probably why you didn't get reasonable result from yolo model in C#. I would also check if the input requirement for yoloV5 is in RGB order as well.

bbhxwl commented 3 months ago

channel-first encoding: [B, C, W, H] (in RGB order)

inputTensor = [img[i,j].R for i, j in img] + [img[i,j].G for i, j in img] + [img[i, j].B for i,j in img]

channel-last encoding: [B, W, H, C] (in RGB order)

inputTensor = [img[i,j].R, img[i, j].G, img[i, j].B for i, j in img]

You're writing the data in channels last format [width,height,channels] but YOLO5 wants channels first [channels,width,height]

Yeah that's probably why you didn't get reasonable result from yolo model in C#. I would also check if the input requirement for yoloV5 is in RGB order as well.

The following seems to work, but is there a simpler method in C #? There should be a written method, right?

    public static Tensor<float> PreprocessImage(Stream stream)
    {
        int targetWidth = 640; // YOLOv5的输入大小通常是640x640
        int targetHeight = 640;
        // 使用SkiaSharp进行图像处理
        using (SKBitmap skBitmap = SKBitmap.Decode(stream))
        using (SKBitmap resizedBitmap = skBitmap.Resize(new SKImageInfo(targetWidth, targetHeight), SKFilterQuality.High))
        {
            // 将图片像素转换为浮点数数组,存储为 [channels, width, height]
            float[] imageData = new float[3 * targetWidth * targetHeight]; // 3是因为RGB三通道
            int indexR = 0;
            int indexG = targetWidth * targetHeight;
            int indexB = 2 * targetWidth * targetHeight;

            for (int y = 0; y < resizedBitmap.Height; y++)
            {
                for (int x = 0; x < resizedBitmap.Width; x++)
                {
                    SKColor pixel = resizedBitmap.GetPixel(x, y);
                    // 将像素值归一化到0-1之间
                    imageData[indexR++] = pixel.Red / 255.0f;
                    imageData[indexG++] = pixel.Green / 255.0f;
                    imageData[indexB++] = pixel.Blue / 255.0f;
                }
            }

            // 将数据转换为Tensor<float>
            var dimensions = new[] { 1, 3, targetHeight, targetWidth }; // batch size 为 1, 通道在前
            return new DenseTensor<float>(imageData, dimensions);
        }
    }
LittleLittleCloud commented 3 months ago

Maybe take a look at this api? https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.extractpixels?view=ml-dotnet