ermig1979 / Simd

C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
http://ermig1979.github.io/Simd
MIT License
2.03k stars 406 forks source link

SimdMotion performance #184

Closed trlsmax closed 2 years ago

trlsmax commented 2 years ago

Hi, I try to build a meteor detector by use of SimdMotion class, in raspberry pi 4. As my test on a overclock raspberry pi 4. It use around 55ms to process one 1920x1080 frame. Is there any change to boost the performance of SimdMotion, by something like OpenMP or TBB, so the process time can down to 30ms ?

I tried to add #pragma omp parallel for in function EstimateDifference and Apply, it made it worst.

ermig1979 commented 2 years ago

Hi! Are you sure that EstimateDifference function is the most performance intensive? May be this is other function? You can define macro SIMD_CHECK_PERFORMANCE to receive more detail performance report (see file TestMotion.cpp).

trlsmax commented 2 years ago

Yes, I am sure. I use Tracy to profile the processing. The average time of frame processing is 55 ms. EstimateDifference used about 23 ms, and, UpdateBackground used about 21 ms.

ermig1979 commented 2 years ago

Used parameters (to detect very small and fast object on the big image) cause using of full frame in model of background. The background model contains 3 features, every feature has 5 grayscale image pyramids. In sum background model takes about 40-50 MB. Current algorithm accesses to this model two times every frame (in EstimateDifference and UpdateBackground functions). I think that the algorithm is bound by memory bandwidth. Obviosly that #pragma omp parallel in the case does not give any effect.

We can't do these algorithms faster. But we can use them rarer. EstimateDifference function must run every frame to detect fast object. But we don need to call UpdateBackground every frame (night sky changes very slowly). So we can add addition parameter to skip calling of UpdateBackground.

ermig1979 commented 2 years ago

I Added parameter BackgroundStatUpdateTime. Try to set: options.BackgroundStatUpdateTime = 0.2;

ermig1979 commented 2 years ago

I added the second optimization: in our case we in fact don't use gradient (dx and dy) features(options.DifferenceDxFeatureWeight = 0, options.DifferenceDyFeatureWeight = 0). I switched off processing of such features (which weight is equal to zero). I think that it can significantly increase performance. I will wait for your feedback.

trlsmax commented 2 years ago

Hi, thank you very much! Average frame processing time is now 25 ms, EstimateDifference is now 13 ms, and UpdateBackground is just 1.8 ms. Great ! I should prepare a field test maybe.