The commit 24b790d0395ac6ebfd3f3ccfef2b8383a2f4468b states:
Replace SAD function with assembly version ~ 25% faster
However ABSDIFF macro (the assembly version) uses hardcoded image width of 64 px, while compute_sad_8x8 has variable image width. So I made an experiment - I added timing + debug output to the code and measured compute_sad_8x8(..., (uint16_t) FRAME_SIZE) vs ABSDIFF(...) vs compute_sad_8x8(..., 64); and I got that the last one is actually the fastest. Also, the speed up from compute_sad_8x8(..., (uint16_t) FRAME_SIZE) to ABSDIFF(...) is far less than 25% for me - more like 8%.
Assuming somebody can reproduce my results - maybe we can just delete ABSDIFF macro and use compute_sad_8x8(..., 64) instead?
The commit 24b790d0395ac6ebfd3f3ccfef2b8383a2f4468b states:
However
ABSDIFF
macro (the assembly version) uses hardcoded image width of 64 px, whilecompute_sad_8x8
has variable image width. So I made an experiment - I added timing + debug output to the code and measuredcompute_sad_8x8(..., (uint16_t) FRAME_SIZE)
vsABSDIFF(...)
vscompute_sad_8x8(..., 64)
; and I got that the last one is actually the fastest. Also, the speed up fromcompute_sad_8x8(..., (uint16_t) FRAME_SIZE)
toABSDIFF(...)
is far less than 25% for me - more like 8%. Assuming somebody can reproduce my results - maybe we can just deleteABSDIFF
macro and usecompute_sad_8x8(..., 64)
instead?