georgmartius / vid.stab

Video stabilization library
http://public.hronopik.de/vid.stab/
Other
850 stars 109 forks source link

Change compareSubImg_thr_sse2 to use the intrinsic method _mm_sad_epu8 #99

Open gabilan opened 3 years ago

gabilan commented 3 years ago

Change for how SAD (Sum of Absolute Differences) is computed in the SSE optimized path. There is a built-in intrinsic for computing SAD (PSADBW). By switching to the intrinsic, we see improvement in motion estimation/detection performance by 2x. All current tests pass -- maybe more tests are needed.

georgmartius commented 3 years ago

Hi, wow, this is interesting. We really need to make sure this is correct. It looks fantastic if we can reduce the code to that small amount. Can you create more tests where the corner cases are tested, for instance different field sizes. I just want to make sure it really does the right thing.

gabilan commented 3 years ago

I don't think it's necessary to explore field size testing because the field size is forced to be a multiple of 16. The code below is in the two places where field->size is determined. I am a bit concerned about test coverage -- but only because this method accounts for the majority of CPU time.

#if defined(USE_SSE2) || defined(USE_SSE2_ASM)
  fieldSize     = (fieldSize / 16 + 1) * 16;
  fieldSizeFine = (fieldSizeFine / 16 + 1) * 16;
#endif
georgmartius commented 3 years ago

Let me check the code myself. Give me a few days, I am very busy at the moment.