BobLd / YOLOv4MLNet

Use the YOLO v4 and v5 (ONNX) models for object detection in C# using ML.Net
MIT License
79 stars 31 forks source link

Tips for modifying this for YoloV5 #2

Closed Transigent closed 3 years ago

Transigent commented 3 years ago

Hi I was excited to find this project, I hoped that YoloV4 and YoloV5 were similar enough that I could successfully use it to run my YoloV5 ONNX-exported model. But apparently it's not as simple as changing the path to the model and recompiling... :) Unfortunately I don't know a lot about the key model properties that you use in this code.

The first thing I noted was that Netron reports different shapes for the inputs and outputs. The YoloV4 model input is shaped Input { 1, 416, 416, 3 } and outputs { 1, 52, 52, 3, 85 }, { 1, 26, 26, 3, 85 }, { 1, 13, 13, 3, 85 } and the YoloV5 model (YOLOV5l) is shaped: Input { 1, 3, 640, 640 } and outputs { 1, 3, 80, 80, 19 }, { 1, 3, 40, 40, 19 }, { 1, 3, 20, 20, 19 }

I changed the code to reflect these differences along with the column names for the inputs and outputs.

I also changed the anchor figures according to what I found in lines 8-10 under anchors here: https://github.com/ultralytics/yolov5/blob/master/models/yolov5l.yaml as well as the SHAPES constants.

I changed all references to 416 to 640 as thats the default pixel dimension.

It appeared to run but at the line var results = predict.GetResults(classesNames, 0.3f, 0.7f); I got 11,000 results. Clearly my changes were not enough.

I havent changed the XYSCALE constants as I am not sure what this is.

Do you have any thoughts about how I might get this to work? It it actually likely to work at all, ie. is YoloV5 too different to work with this code at all?

Thanks for any tips.

BobLd commented 3 years ago

Hi @Transigent, do you have an onnx version of the model?

Transigent commented 3 years ago

Hi BobLd

Thanks very much for the response, I really appreciate it!

Here is an ONNX export of the original YOLOV5l model pretrained on COCO with the default 80 classes. Unlike my version with 14 classes it will work with everyday images, plus I haven't contaminated it with training. https://drive.google.com/drive/folders/18PpGCnQ4Ca4vRMBQbvSta3chqtuh2zXV?usp=sharing

Let me know if you need anything else.

BobLd commented 3 years ago

Thanks @Transigent.

A first comment concerning the input size: because the color dimension appears first in your model ( { 1, *3*, 640, 640 } ), I think you will need to set the interleavePixelColors to false to have something like this: .Append(mlContext.Transforms.ExtractPixels(outputColumnName: "images", scaleImage: 1f / 255f, interleavePixelColors: false))

Also, can you give the link to the part where the image is pre-processed in the python library? I need it in order to understand if you need to scale the image by 1f / 255f or not.

I'll have a look soon at the rest.

EDIT: same remark for the output layers, the color channel comes first. I think the offset will need to be changed accordingly.

Transigent commented 3 years ago

Great, once again thanks so much for your thoughts. I will sit down with the code after work tomorrow when I get a second, and try to see how it works. I'm having trouble understanding it at a glance.

The closest things that might have relevance are here and here

Also the anchor constants are here

Again thanks for all your time!

BobLd commented 3 years ago

@Transigent Any progress on your side? If not, I'll try to have a look soon

Transigent commented 3 years ago

Hi @BobLd , sorry I haven't been in touch, we had to farewell our cat due to illness. It has been difficult. When the situation permits I will get back to this.

Awakawaka commented 3 years ago

Hi, @BobLd, currently I'm struggling to do the same as @Transigent was trying to do:( I'm using yolov4-tiny which is trained on my custom dataset. I've converted this model to onnx format by using a python script which I grabbed from this repository https://github.com/Tianxiaomo/pytorch-YOLOv4. I don't really understand how VectorType attr is working and googling wasn't helped me at all. For example, my model output shape is [1,2535,1,4], how this will be represented in a one-dimensional array? I would really appreciate your help and if needed I'll move this topic to another isssue

BobLd commented 3 years ago

Hi @Awakawaka,

You need your output as a float[]. Try creating a predictionEngine like I did here.

Then, if your output is of size [1, 2535, 1, 4], you can basically skip the 1s, leaving you with a 2D array of shape [2535, 4]. ML.Net will give you a 1D array representation of size 2535 x 4 = 10140.

My guess is that the size 4 is for a single bounding box (4 elements: x1, y1, x2, y2), and you have 2535 bounding boxes.

Try the following:

// output is your model's 1D prediction array of size 2535 * 4 = 10140
List<(float, float, float, float)> bboxes = new List<(float, float, float, float)>();
for (int i = 0; i < 2535 * 4; i+=4)
{
    bboxes.Add((output[i], output[i+1], output[i+2], output[i+3]));
}

=> bboxes will contain all your bounding box predictions. Then you will certainly need to do some post-processing of the bounding boxes coordinates.

Hope this helps

ESEricWare commented 3 years ago

Hi guys,

I'm trying to do the same. Is there any progress from your side?

Thanks a lot

BobLd commented 3 years ago

Hi all,

I did some progress, the categories now seems to be correct but not the bounding boxes yet. The part you are missing is that Detect() needs to be implemented in C# (it is not present in the onnx model).

Please have a look here: https://github.com/BobLd/YOLOv4MLNet/tree/yolo-v5 for the latest progress.

Please find more info here:

post-processing is also done here:

Any help welcome on the missing bits!

kellansteele commented 3 years ago

Hi, I'm trying to use YOLOv5 with ML.Net and have a few questions about where your project is so far!

  1. Say I was to export my YOLOv5s.pt model to ONNX format with the Detect() layer included (following ultralytics/yolov5#343 (comment). In your code, have you implemented any of the Detect() layer so far? If so, where?
  2. Further down in ultralytics/yolov5#343 (comment), they mention the following:

In both cases, you do miss the following:

- filtering results with objectness lower than some threshold
- NMS
- conversion from xc, yc, w, h to x1, y1, x2, y2

I've had a look at your code and it looks like you've implemented both the filtering results with objectness and conversion steps but not NMS. Is this correct?

Thanks so much!

deanbennettdeveloper commented 3 years ago

I've managed to get something working, but my bounding boxes are slightly off all the time. Would this be the NMS routine too maybe. I've taken the raw output by switching the export=False and simply using this output (I did something similar with Yolov4).

As it happens I'm not interested in the exact boxes for my application, just objects and their confidence rating so this works perfectly for me and super quick even on a CPU.

deanbennettdeveloper commented 3 years ago

What I'm seeing is exactly what is mentioned here to be fair. So I think its definitely that I'm missing a final set of steps with the bounding boxes, but like I say I'm only interested in confidences and labels at the moment.

https://towardsdatascience.com/object-detection-part1-4dbe5147ad0a

BobLd commented 3 years ago

@deanbennettdeveloper Thanks for your help, I will have a look at the code you provided here. Did you modify anything else in the code? Your code basically replaces my NMS code, right?

@kellansteele:

deanbennettdeveloper commented 3 years ago

Hi BobLd, that's great thanks for having a look at this. My code just replaces the GetResults you have in the Prediction class you created. I didn't modify anything else other than number of classes as I've 15 rather than 80.

I used Roboflow.ai to create the yolov5 (small) model. Only takes about an hour to train. Only bit I modified on there is make sure it grabs the latest HEAD of the Yolov5 repo, (currently it reverts to an older version). I initially had issues with the older version as the model worked but failed to export to ONNX.

The inference time though on a small yolov5 model is fantastic.

deanbennettdeveloper commented 3 years ago

Thanks for the NMS examples too! I'll try those in my code too!

deanbennettdeveloper commented 3 years ago

I guess this bit of my code does a basic type of NMS? Although I guess it would not pick up multiple detections in differentiations parts of the image, that would be it's flaw currently.

List r = new List();

  foreach(var label in resultsNms.Select(p => p.Label).Distinct()) {
    r.Add(resultsNms.Where(p => p.Label == label).OrderByDescending(p => p.Confidence).First());
  }
raulsf6 commented 3 years ago

@deanbennettdeveloper is a good starting point. Confidence and labels are calculated correctly, so it is just needed to tune NMS and IoU to get the bounding boxes working.

raulsf6 commented 3 years ago

Hey! I think I properly fixed the bounding boxes issue in @deanbennettdeveloper implementation. I properly added NMS implementation of @BobLd. Could you take a look? Apart from this two functions, it is important to say that interleavePixelColors: false was the key. The rest of the code should just adjust dimensions to 640x640 images and yolov5 shape.

        public IReadOnlyList<YoloV5Result> GetResults(string[] categories, float scoreThres = 0.5f, float iouThres = 0.5f)
        {

            // Probabilities + Characteristics
            int characteristics = categories.Length + 5;

            // Needed info
            float modelWidth = 640.0F;
            float modelHeight = 640.0F;
            float xGain = modelWidth / ImageWidth;
            float yGain = modelHeight / ImageHeight;
            float[] results = Output;

            List<float[]> postProcessedResults = new List<float[]>();

            // For every cell of the image, format for NMS
            for (int i = 0; i < 25200; i++)
            {
                // Get offset in float array
                int offset = characteristics * i;

                // Get a prediction cell
                var predCell = results.Skip(offset).Take(characteristics).ToList();

                // Filter some boxes
                var objConf = predCell[4];
                if (objConf <= scoreThres) continue;

                // Get corners in original shape
                var x1 = (predCell[0] - predCell[2] / 2) / xGain; //top left x
                var y1 = (predCell[1] - predCell[3] / 2) / yGain; //top left y
                var x2 = (predCell[0] + predCell[2] / 2) / xGain; //bottom right x
                var y2 = (predCell[1] + predCell[3] / 2) / yGain; //bottom right y

                // Get real class scores
                var classProbs = predCell.Skip(5).Take(categories.Length).ToList();
                var scores = classProbs.Select(p => p * objConf).ToList();

                // Get best class and index
                float maxConf = scores.Max();
                float maxClass = scores.ToList().IndexOf(maxConf);

                postProcessedResults.Add(new[] { x1, y1, x2, y2, maxConf, maxClass });

                // Discard low confs predictions
                if (maxConf > scoreThres)
                {
                    // Format [ x1, y1, x2, y2, maxConf, maxClass ]
                    postProcessedResults.Add(new[] { x1, y1, x2, y2, maxConf, maxClass });
                }

            }

            var resultsNMS = ApplyNMS(postProcessedResults, categories, iouThres);

            return resultsNMS;
        }

        private List<YoloV5Result> ApplyNMS(List<float[]> postProcessedResults, string[] categories,  float iouThres=0.5f)
        {
            postProcessedResults = postProcessedResults.OrderByDescending(x => x[4]).ToList(); // sort by confidence
            List<YoloV5Result> resultsNms = new List<YoloV5Result>();

            int f = 0;
            while (f < postProcessedResults.Count)
            {
                var res = postProcessedResults[f];
                if (res == null)
                {
                    f++;
                    continue;
                }

                var conf = res[4];
                string label = categories[(int)res[5]];

                resultsNms.Add(new YoloV5Result(res.Take(4).ToArray(), label, conf));
                postProcessedResults[f] = null;

                var iou = postProcessedResults.Select(bbox => bbox == null ? float.NaN : BoxIoU(res, bbox)).ToList();
                for (int i = 0; i < iou.Count; i++)
                {
                    if (float.IsNaN(iou[i])) continue;
                    if (iou[i] > iouThres)
                    {
                        postProcessedResults[i] = null;
                    }
                }
                f++;
            }

            return resultsNms;
        }
BobLd commented 3 years ago

Hi @raulsf6, sounds great!! Would you be able to push your work in https://github.com/BobLd/YOLOv4MLNet/tree/yolo-v5-incl? I won't have the time to look at it today, but I think it would be great to have your work in a branch.

Concerning my implementation, more work needs to be done on the bounding box (fully implementing Detect() in C#), so this is normal the BBoxes you get are not correct, even with NMS activated.

raulsf6 commented 3 years ago

@BobLd sure! I have to say final fix was setting resizing: ResizingKind.Fill and it works perfectly. I'm gonna clean the code and push it.

deanbennettdeveloper commented 3 years ago

Hi @raulsf6 That's fantastic news, thanks for looking into this and adding those missing pieces! I'll give this a go too on my code and hopefully it will also fix it for me.

In terms of the ResizingKind.Fill, I've taken a different approach and gone with the padding as that is also how I've trained my model with padding. So I guess this setting is important depending how the model was built this is a good article for that

https://blog.roboflow.com/you-might-be-resizing-your-images-incorrectly/

Thank you both again (@BobLd & @raulsf6) for the work on this. Looks like we've finally got a full Yolov5 working with ML.NET.

deanbennettdeveloper commented 3 years ago

Hi @raulsf6 , you are definitely right with the ResizingKind.Fill setting. I've now changed to this with your code and the boxes are perfect. However I'm not getting as good detection with this method, I'll have more of a play with the model and different training image settings. I'm sure I'll hit a sweet spot eventually. Thanks for the code, it works great.

raulsf6 commented 3 years ago

Hi @deanbennettdeveloper and @BobLd, I just made a pull request with the code. Thanks to your work I could learn a lot about computer vision and Yolo. Check the code out when you have some time and tell me if something could be improved!

deanbennettdeveloper commented 3 years ago

Hi @raulsf6, I've worked out the issue with the ResizeKind.IsoPad and not working correctly. It dawned on me that the coords of the boxes from the results were based on a padded image which has either black bars at top and bottom or right and left depending on the aspect ratio. The same coords were then being applied to the original image which hadn't been padded. So the trick is to calculate the xoffset and yoffset that would have been used in the model image transformation and use these to adjust the xGain and yGain plus translating the centrex and centre y.

Here's my original code (before your NMS) but if you do decide to use the IsoPad and train model with padding then this is what you'll need. I am finding much better confidence results with the padded image model as there is no distortion of the image.

public IReadOnlyList GetResults(string file, float imageWidth, float imageHeight, string[] categories, float scoreThres = 0.5f, float iouThres = 0.5f) {

  //List<float[]> postProcesssedResults = new List<float[]>();

  int size = Output.Length; // 1x25200x85=2142000
  int dimensions = categories.Length + 5;
  int rows = (int)(size / dimensions); //25200
  int confidenceIndex = 4;
  int labelStartIndex = 5;
  float modelWidth = 640.0F;
  float modelHeight = 640.0F;

 float xoffset = 0;
  float yoffset = 0;

  float aspectratio = imageWidth / imageHeight;

  if (aspectratio > 1) { // most common for videos
    float actual_imageheight = modelWidth / aspectratio;
    yoffset = (modelHeight - actual_imageheight) / 2F;
  } else { // I guess common for mobile phones videos
    float actual_imagewidth = modelHeight * aspectratio;
    xoffset = (modelWidth - actual_imagewidth) / 2F;
  }

  float xGain = (modelWidth - (xoffset * 2)) / imageWidth;
  float yGain = (modelHeight - (yoffset * 2)) / imageHeight;

  List<YoloV5Result> resultsNms = new List<YoloV5Result>();

  for (int i = 0; i < rows; ++i) {

    int index = i * dimensions;
    if (Output[index + confidenceIndex] <= scoreThres) continue;

    for (int j = labelStartIndex; j < dimensions; ++j) {
      Output[index + j] = Output[index + j] * Output[index + confidenceIndex];
    }

    for (int k = labelStartIndex; k < dimensions; ++k) {

      if (Output[index + k] <= scoreThres) continue;

      string label = categories[k - labelStartIndex];
      float confidence = Output[index + k];

      Output[index] = Output[index] - xoffset;
      Output[index+1] = Output[index+1] - yoffset;

      var x1 = (Output[index] - Output[index + 2] / 2) / xGain; //top left x
      var y1 = (Output[index + 1] - Output[index + 3] / 2) / yGain; //top left y
      var x2 = (Output[index] + Output[index + 2] / 2) / xGain; //bottom right x
      var y2 = (Output[index + 1] + Output[index + 3] / 2) / yGain; //bottom right y

      resultsNms.Add(new YoloV5Result(x1, y1, x2, y2, label, confidence, ""));

    }
  }

  List<YoloV5Result> r = new List<YoloV5Result>();

  foreach(var label in resultsNms.Select(p => p.Label).Distinct()) {
    r.Add(resultsNms.Where(p => p.Label == label).OrderByDescending(p => p.Confidence).First());
  }

  return r;

}
BobLd commented 3 years ago

@deanbennettdeveloper, if you think it is useful, please make a commit to the same branch as @raulsf6. Could be useful to have everything in a single branche

keesschollaart81 commented 3 years ago

I wanted to work in C# with the latest Yolov5 (release 4). The script provided by @BobLd https://github.com/BobLd/YOLOv4MLNet/issues/2#issuecomment-748586959 came close but did not work for me. I'm still no sure why this is but the way the offset was calculated resulted in wrong data for the output of my model.

I had to rewrite and optimized along the way here and there. As this topic/repo was very useful I'd wanted to give back a bit by sharing my script: https://gist.github.com/keesschollaart81/83de609f0852670656290fe0180da318. I'm not using ML.NET but 96% of the code can be reused if you like.

It's basically my C# rewrite of this

BobLd commented 3 years ago

@keesschollaart81, amazing! I'll add that to the ReadMe.