Open balintlaczko opened 3 years ago
Although Numba could potentially add some speed improvements, I think it might not solve the multicore-part of the issue, it would rather speed up the process that happens on a core. A bit of research hinted that opencv does not always cooperate with numba in obvious ways. So I put numba aside (for now) and went ahead to implement the scalable motion function using multiprocessing
. This will be much more scalable, since it will use all the available cores on the system (will be great for VDI hopefully). Currently implemented as a separate method, but after successful platform testing, I'll make multiprocessing (and then the number of cores to use) as parameters.
OK. Multicore version of motion
is thoroughly tested, so it produces identical results regardless of the number of processes (checked csv line by line, motiongrams pixel by pixel, and videos frame by frame). Tested on Ubuntu, it seems to check out (after the bugfixes). Need to check in Mac OS and Colab before moving to the next step (which will be fully integrating it into the default mg_motion
).
A quick (single-shot) benchmarking attempt on my 6-core 12-thread laptop:
With 2 cores it is 1.832377 times faster.
With 3 cores it is 2.367094 times faster.
With 4 cores it is 2.751237 times faster.
With 5 cores it is 3.021663 times faster.
With 6 cores it is 3.061283 times faster.
With 7 cores it is 3.138544 times faster.
With 8 cores it is 3.218801 times faster.
With 9 cores it is 3.183287 times faster.
With 10 cores it is 3.219616 times faster.
With 11 cores it is 3.270581 times faster.
With 12 cores it is 3.157296 times faster.
It is a bit curious why the performance dropped with the maximum amount of cores available in the end, maybe it is just a measurement error. However it is also clear that (at least on Windows) spawning more and more processes leads to diminishing results. The improvement is enormous going from a single core to dual core. It is interesting that the leap from 1 core to 2 cores is bigger than the improvement from 2 cores to 12 cores.
The motion function (technically method) is implemented in Opencv (though there is an FFmpeg-based implementation in the
_utils.py
, that however produces slightly different results), and since it is doing a lot of matrix operations in one big loop, it basically maxes out 1 core of the CPU. I recently started to study the Numba library, and I think this situation is very adequate for its use. It even supports CUDA, which could also be an item on our enhancement lists, but for now I would be happy to see improved speeds with only the CPU. With most of the functions now based on FFmpeg, the speed of the motion function sticks out a bit too much (especially considering that it is one of the most-used functions). One thing that could simplify the implementation is that luckily we already work mostly with numpy arrays in the motion function, so there probably won't be too many changes necessary. An +1: since librosa has numba as a dependency, we wouldn't extend our dependencies by using it.