almondyoung / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

sobel re-optimize #444

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The first step in sobel is inadvertantly unoptimized:

#if defined(HAS_ARGBTOBAYERGGROW_SSE2)
  if (TestCpuFlag(kCpuHasSSE2)) {
    ARGBToBayerRow = ARGBToBayerGGRow_Any_SSE2;
    if (IS_ALIGNED(width, 8)) {
      ARGBToBayerRow = ARGBToBayerGGRow_SSE2;
    }
  }
#endif
#if defined(HAS_ARGBTOBAYERROW_SSSE3)
  if (TestCpuFlag(kCpuHasSSSE3)) {
    ARGBToBayerRow = ARGBToBayerRow_Any_SSSE3;
    if (IS_ALIGNED(width, 8)) {
      ARGBToBayerRow = ARGBToBayerRow_SSSE3;
    }
  }
#endif
#if defined(HAS_ARGBTOBAYERGGROW_NEON)
  if (TestCpuFlag(kCpuHasNEON)) {
    ARGBToBayerRow = ARGBToBayerGGRow_Any_NEON;
    if (IS_ALIGNED(width, 8)) {
      ARGBToBayerRow = ARGBToBayerGGRow_NEON;
    }
  }
#endif

and the last step does not handle odd width

Testing C versus assembly should show a large difference.
It shows a difference, but not as high as it should be

set LIBYUV_DISABLE_ASM=0
set LIBYUV_WIDTH=4096
set LIBYUV_HEIGHT=2048
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
out\release\libyuv_unittest --gtest_filter=*ARGBSobelXY_Opt   | findstr /r 
"^[^_]*_[^_]*ms"
ARGBSobelXY_Opt (12539 ms)

set LIBYUV_DISABLE_ASM=1
set LIBYUV_WIDTH=4094
set LIBYUV_HEIGHT=2048
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=0
out\release\libyuv_unittest --gtest_filter=*ARGBSobelXY_Opt   | findstr /r 
"^[^_]*_[^_]*ms"
ARGBSobelXY_Opt (57926 ms)

set LIBYUV_DISABLE_ASM=0
set LIBYUV_WIDTH=4094
set LIBYUV_HEIGHT=2048
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=0
out\release\libyuv_unittest --gtest_filter=*ARGBSobelXY_Opt   | findstr /r 
"^[^_]*_[^_]*ms"
ARGBSobelXY_Opt (22634 ms)

Original issue reported on code.google.com by fbarch...@chromium.org on 26 May 2015 at 11:55

GoogleCodeExporter commented 8 years ago
r1415 does first step using ARGBToJ400 - luma calculation of jpeg color space, 
and sobel last step, using any functions to handle odd width and luma that 
supports AVX2.  On AVX2

set LIBYUV_WIDTH=1278
set LIBYUV_HEIGHT=720
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1

Was C+SSE2+C
out\release\libyuv_unittest_old --gtest_filter=*Sobel*   | findstr /r 
"^[^_]*_[^_]*ms"
[       OK ] libyuvTest.ARGBSobel_Any (4871 ms)
[       OK ] libyuvTest.ARGBSobel_Unaligned (4891 ms)
[       OK ] libyuvTest.ARGBSobel_Invert (4953 ms)
[       OK ] libyuvTest.ARGBSobel_Opt (4891 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Any (3719 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Unaligned (3734 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Invert (3797 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Opt (3719 ms)
[       OK ] libyuvTest.ARGBSobelXY_Any (4891 ms)
[       OK ] libyuvTest.ARGBSobelXY_Unaligned (4906 ms)
[       OK ] libyuvTest.ARGBSobelXY_Invert (4984 ms)
[       OK ] libyuvTest.ARGBSobelXY_Opt (4907 ms)

Now AVX2+SSE2+SSE2
out\release\libyuv_unittest --gtest_filter=*Sobel*   | findstr /r 
"^[^_]*_[^_]*ms"
[       OK ] libyuvTest.ARGBSobel_Any (2531 ms)
[       OK ] libyuvTest.ARGBSobel_Unaligned (2500 ms)
[       OK ] libyuvTest.ARGBSobel_Invert (2610 ms)
[       OK ] libyuvTest.ARGBSobel_Opt (2515 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Any (2157 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Unaligned (2156 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Invert (2219 ms)
[       OK ] libyuvTest.ARGBSobelToPlane_Opt (2156 ms)
[       OK ] libyuvTest.ARGBSobelXY_Any (2500 ms)
[       OK ] libyuvTest.ARGBSobelXY_Unaligned (2531 ms)
[       OK ] libyuvTest.ARGBSobelXY_Invert (2610 ms)
[       OK ] libyuvTest.ARGBSobelXY_Opt (2515 ms)

Original comment by fbarch...@chromium.org on 28 May 2015 at 11:43