NickSwardh / YoloDotNet

YoloDotNet - A C# .NET 8.0 project for Classification, Object Detection, OBB Detection, Segmentation and Pose Estimation in both images and videos.
GNU General Public License v3.0
158 stars 28 forks source link

Post processing instance segmentation taking a lot of time #14

Open lwillems191 opened 3 months ago

lwillems191 commented 3 months ago

Hello,

When using instance segmentation the post processing is taking quite a lot of time and I was wondering if there might be way to optimize it. I found which line is taking the most time, but have not found a way to optimize it. Maybe somebody else has a good idea.

var value = Enumerable.Range(0, output.Channels).Sum(i => tensor1[0, i, y, x] * maskWeights[i]);

aloksharma1 commented 3 months ago

can you try this? (untested but changing to a for loop would surely improve it)

float value = 0;
for (int i = 0; i < output.Channels; ++i)
{
    value += tensor1[0, i, y, x] * maskWeights[i];
}
NickSwardh commented 3 months ago

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

aloksharma1 commented 2 months ago

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

is this branch prod ready (on nuget)?

lwillems191 commented 2 months ago

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

Yeah this branch already gives a great improvement. Thanks for the work you put into it.

NickSwardh commented 2 months ago

Awesome! Thank you for letting me know :)

NickSwardh commented 2 months ago

Currently working on improving the overall performance in this branch https://github.com/NickSwardh/YoloDotNet/tree/performance where I've replaced the usage a legacy ONNX-class to use OrtValue Api instead for improved performance along with a few other tweaks here and there.

is this branch prod ready (on nuget)?

No, not yet, It's a work in progress. I'm still turning the nuts and bolts to see if I can squeeze some more speed out of this thing ;)

louislewis2 commented 2 months ago

Hi @NickSwardh

First off, thanks for the great library you have created here!

Inspired by this issue and facing some performance issues myself, I forked your branch and initially added some benchmarks to ensure that code changes for perf can be validated. Once the benchmarks were in place, I was able to spot some quick wins that at least in my testing has dramatically improved the overall performance. I also added a few other useful benchmarks to start understanding where time is spent and memory is allocated. The reduced GC pressure has increased my overall throughput in my application due to there now being less GC induced pauses.

The benchmarks that require it, also run both Gpu and Cpu variations, so that one can spot improvements or degradations over both at the same time.

I have created a PR if you are interested, I apologize upfront for the size of it. Some refactoring seemed fitting to make provision for sharing of resources like the assets etc..

https://github.com/NickSwardh/YoloDotNet/pull/16

lwillems191 commented 2 weeks ago

I did some more testing. The new improvements make the code already a lot faster, but from my testing it seems it might be better to not use Parellel.For loops. They seem to be a lot less consistent then a normal for loop. Also the speed improvement does not seem that much. Hopefully you will take this into consideration.