Yolov5-v5 : 3x20x20x85 output is kept after the nms, ANE fails.

Hi,

Thanks for the great work, I stumbled on a few issues when converting the model from the v5 of Yolov5 but nothing too serious.

The first one was that the 3x20x20x85 output from the Yolo model was returned as an output of the final pipeline. Adding :

if builder.spec.description.output[-1].name == "p20":
        del builder.spec.description.output[-1]

Solved this issue.

Where I had previously renamed the output of the yolo model :

3x80x80x85 output -> p80
3x40x40x85 output -> p40
3x20x20x85 output -> p20

The second one happened when running the model in Apple's object detection demo app Breakfast Finder. This seems to indicate that the pipeline model does not run on the ANE. However it seems to run on the GPU as the CPU usage is fairly low (30%) on this app for the yolov5s.pt checkpoint from the ultralytics repo.

The yolov5s.pt model was converted using the export.py function and the environment from their repo. Mainly coremltools 4.1 and pytorch 1.9

The first one was that the 3x20x20x85 output from the Yolo model was returned as an output of the final pipeline.

I cannot reproduce. Could you specify where exactly this is returned? In the metadata of xcode it seems all correct to me

The yolov5s.pt model was converted using the export.py function and the environment from their repo. Mainly coremltools 4.1 and pytorch 1.9

I'm confused what you mean by you used the "export.py" script? The export scripts converts the pytorch to something else. For example to onnx or also to coreml. Our script is essentially a more complex version of the export script. So did you use our script or their script?

Also, in case you have used our script: Did you read the README and especially the notes regarding ANE Support?

Here is the part without the links. Have a look in the readme to see the formatted version with links:

Note: It has a huge impact on performance if the model runs on the NeuralEngine or the CPU / GPU (or switches between them) on your device. Unfortunately, there is no documentation which model layers can run on the neural engine and which not (some infos here). With yolov5 version 2, 3 and 4 there were problems with the SPP Layers with kernel sizes bigger than 7, so we replaced them and retrained the model. On a recent device YOLOv5s should be around 20ms / detection. See Issue 2526

Perhaps you could also benchmark how fast your model is for a detection.

dbsystel / yolov5-coreml-tools

Yolov5-v5 : 3x20x20x85 output is kept after the nms, ANE fails. #3