KerwinMa / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

rotate 90/270 is slow on Neon #155

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
ARGBRotate90 (58637 ms)
ARGBRotate270 (57182 ms)
ARGBRotate90_Odd (48021 ms)
ARGBRotate270_Odd (46080 ms)

Original issue reported on code.google.com by fbarch...@chromium.org on 15 Nov 2012 at 2:03

GoogleCodeExporter commented 9 years ago
The benchmark shows C+Neon.  It improved a little with better Neon
57739 - [       OK ] libyuvTest.ARGBRotate270 (57739 ms)
56844 - [       OK ] libyuvTest.ARGBRotate90 (56844 ms)
48112 - [       OK ] libyuvTest.ARGBRotate90_Odd (48112 ms)
46155 - [       OK ] libyuvTest.ARGBRotate270_Odd (46155 ms)
Suggest not benchmarking the C code.
sudo LIBYUV_REPEAT=1000 nice --5 ./libyuv_unittest --gtest_filter=*Rotate*
Note: Google Test filter = *Rotate*
[==========] Running 24 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 24 tests from libyuvTest
[ RUN      ] libyuvTest.ARGBRotate0
filter 0 -     1901 us C -     1471 us OPT
[       OK ] libyuvTest.ARGBRotate0 (3949 ms)
[ RUN      ] libyuvTest.ARGBRotate90
filter 90 -    27876 us C -    26107 us OPT
[       OK ] libyuvTest.ARGBRotate90 (54606 ms)
[ RUN      ] libyuvTest.ARGBRotate180
filter 180 -     2564 us C -     2077 us OPT
[       OK ] libyuvTest.ARGBRotate180 (5214 ms)
[ RUN      ] libyuvTest.ARGBRotate270
filter 270 -    27306 us C -    27698 us OPT
[       OK ] libyuvTest.ARGBRotate270 (55627 ms)
[ RUN      ] libyuvTest.ARGBRotate0_Odd
filter 0 -     1981 us C -     2130 us OPT
[       OK ] libyuvTest.ARGBRotate0_Odd (4681 ms)
[ RUN      ] libyuvTest.ARGBRotate90_Odd
filter 90 -    23905 us C -    23905 us OPT
[       OK ] libyuvTest.ARGBRotate90_Odd (48426 ms)
[ RUN      ] libyuvTest.ARGBRotate180_Odd
filter 180 -     2597 us C -     2622 us OPT
[       OK ] libyuvTest.ARGBRotate180_Odd (5788 ms)
[ RUN      ] libyuvTest.ARGBRotate270_Odd
filter 270 -    22850 us C -    22851 us OPT
[       OK ] libyuvTest.ARGBRotate270_Odd (46312 ms)

ARGBRotate is based on scaling/texture mapping - writting a row of pixels, 16 
bytes at a time, reading out of order.
Suggest rewritting it like YUV, which is a transpose.  Doing 4x4 ARGB would 
allow 16 byte read/write.  Or possibly do 8x8 which is could read with vld4.8 
and use code similar to the YUV to do register based transpose.  8x8 isnt 
doable on x86, but could be broken into 4 4x4 transposes or specialized for x64.

Original comment by fbarch...@google.com on 21 Nov 2012 at 7:09

GoogleCodeExporter commented 9 years ago
Unittest only benchmarks Opt now.
28390 - [       OK ] libyuvTest.ARGBRotate270 (28390 ms)
26922 - [       OK ] libyuvTest.ARGBRotate90 (26922 ms)
24612 - [       OK ] libyuvTest.ARGBRotate90_Odd (24612 ms)
23535 - [       OK ] libyuvTest.ARGBRotate270_Odd (23535 ms)
3274 - [       OK ] libyuvTest.ARGBRotate180_Odd (3274 ms)
2666 - [       OK ] libyuvTest.ARGBRotate0_Odd (2666 ms)
2651 - [       OK ] libyuvTest.ARGBRotate180 (2651 ms)
2122 - [       OK ] libyuvTest.ARGBRotate0 (2122 ms)

Original comment by fbarch...@chromium.org on 26 Nov 2012 at 11:42

GoogleCodeExporter commented 9 years ago

Original comment by fbarch...@google.com on 12 Jan 2013 at 8:55