Open gabilan opened 3 years ago
Hi, wow, this is interesting. We really need to make sure this is correct. It looks fantastic if we can reduce the code to that small amount. Can you create more tests where the corner cases are tested, for instance different field sizes. I just want to make sure it really does the right thing.
I don't think it's necessary to explore field size testing because the field size is forced to be a multiple of 16. The code below is in the two places where field->size is determined. I am a bit concerned about test coverage -- but only because this method accounts for the majority of CPU time.
#if defined(USE_SSE2) || defined(USE_SSE2_ASM)
fieldSize = (fieldSize / 16 + 1) * 16;
fieldSizeFine = (fieldSizeFine / 16 + 1) * 16;
#endif
Let me check the code myself. Give me a few days, I am very busy at the moment.
Change for how SAD (Sum of Absolute Differences) is computed in the SSE optimized path. There is a built-in intrinsic for computing SAD (PSADBW). By switching to the intrinsic, we see improvement in motion estimation/detection performance by 2x. All current tests pass -- maybe more tests are needed.