Problem with a custom trained yolov5 ONNX model

HamidEbr commented 3 years ago

Hi, thanks for providing a source code for yolov5, I, ve tried it and it works for me with yolov5 default ONNX files. I trained a custom model with PyTorch (.pt) and converted it into ONNX. I try to change your code parameters to my own ONNX file by open it on Netron app. My model's input type is type: float32[16,3,640,640] and output is type: float32[16,3,80,80,33], that is not same dimension of yolov5s_full_layer output (float32[1,25200,85]) and also if I try to run code it throws an exception like this at var model = pipeline.Fit(mlContext.Data.LoadFromEnumerable(new List<YoloV4BitmapData>())); code:

System.InvalidOperationException: 'Input shape mismatch: Input 'images' has shape 16,3,640,640, but input data is of length 1228800.'

Please help me to do the right changes.

BobLd commented 3 years ago

Concerning your Error message

If the inputs and ouputs layers are properly defined in your ONNX, I would suggest removing or commenting the following lines (keeping in mind that your issue comes from the definition of the images input shape): https://github.com/BobLd/YOLOv4MLNet/blob/0104728917e3001e5b04feb794c3ffcb6dd8a819/YOLOv4MLNet/Program.cs#L35-L39

If this doesn't work, you need to investigate further:

According to the error message, it seems the input layer of your model is actually 1,3,640,640 (and not 16,3,640,640). This wouldn't be a surprise as the first parameter is the batch size, and ONNX converters tend to set to 1 (even if your original model size is 16)
Can you share a screenshot of your ONNX (and not PyTorch) in Neutron?

Concerning the differences in output layers

Please see my https://github.com/BobLd/YOLOv4MLNet/issues/2#issuecomment-740634809, I think the difference is related to the Detect() method being (or not being - I don't remember) in your ONNX model. Few comments:

Given your model output shape 16,3,80,80,33, I doubt you have only one output layer (I'd guess you have 3, with shapes: 16,3,20,20,33, 16,3,40,40,33 and 16,3,80,80,33)
Again, please share a screenshot of your ONNX model in Neutron so that we can see the input and output layers

HamidEbr commented 3 years ago

Thanks, Here is the screenshot of Netron preview:

I updated my code to that you mentioned, but this time throws another exception:

BobLd commented 3 years ago

The new error comes from the fact that your model expect 16 images as input (batch size is 16) and you only input 1 image (input data is of size 1,228,800 = 1 x 3 x 640 x 640).

Try changing the batch size when you export your model to 1, and change the batch to 1 in the C# code (or you could try to input 16 images in the C# code, but I don't know how to do that).

Concerning your ouput layers, you indeed have 3:

Have a look at the issue here https://github.com/BobLd/YOLOv3MLNet/issues/2
Have a look at the YoloV4 in the master branch, it also has 3 outputs layers

HamidEbr commented 3 years ago

Thanks for your help, I check it and tell you, What can I do for this part:

I know 25200 came from your model output but the dimension of it is different than my model.

HamidEbr commented 3 years ago

I convert pt file again this time into an ONNX file with batch size = 1, this time I've no exception for my model. Now I don't have any prediction I think because of predCell indexes or objConf value. so, is there any changes if my class count (28) is deferent that you (80) because of objConf value always Is a negative number? I check run of your model and don't see any result of negative, but in my model most of the results are negative!

HamidEbr commented 3 years ago

At last, with help of this link Solved!

BobLd / YOLOv4MLNet

Problem with a custom trained yolov5 ONNX model #6

Concerning your Error message

Concerning the differences in output layers