biotrump / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

I420AlphaToARGB performance #496

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Original was MMX code in chromium.  Replaced with 3 step AVX2 code
LIBYUV_FLAGS=-1 ^CBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*I420AlphaToARGB*

C
libyuvTest.I420AlphaToARGB_Opt (7145 ms)
    69.49%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_C                                                 
    26.57%  libyuv_unittest  libyuv_unittest      [.] ARGBAttenuateRow_C                                              
     3.63%  libyuv_unittest  libyuv_unittest      [.] ARGBCopyYToAlphaRow_C  

SSSE3
I420AlphaToARGB_Opt (1380 ms)
    43.37%  libyuv_unittest  libyuv_unittest    [.] I422ToARGBRow_SSSE3                                                                          
    36.17%  libyuv_unittest  libyuv_unittest    [.] ARGBAttenuateRow_SSSE3                                                                       
    17.71%  libyuv_unittest  libyuv_unittest    [.] ARGBCopyYToAlphaRow_SSE2                                                                     
     0.60%  libyuv_unittest  libyuv_unittest    [.] I420AlphaToARGB             

AVX2
I420AlphaToARGB_Opt (591 ms)
    50.20%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2                                                                                          
    27.94%  libyuv_unittest  libyuv_unittest      [.] ARGBAttenuateRow_AVX2                                                                                       
    17.66%  libyuv_unittest  libyuv_unittest      [.] ARGBCopyYToAlphaRow_AVX2                                                                                    
     0.58%  libyuv_unittest  libyuv_unittest      [.] I420AlphaToARGB              

without alpha
I420ToARGB_Opt (322 ms)

Original issue reported on code.google.com by fbarch...@google.com on 25 Sep 2015 at 1:05

GoogleCodeExporter commented 9 years ago
clang on osx runs out of registers
FAILED: /Volumes/data/b/build/goma/gomacc 
../../third_party/llvm-build/Release+Asserts/bin/clang++ -MMD -MF 
obj/source/libyuv.row_gcc.o.d -DV8_DEPRECATION_WARNINGS 
-D__ASSERT_MACROS_DEFINE_VERSIONS_WITHOUT_UNDERSCORE=0 -DCHROMIUM_BUILD 
-DCR_CLANG_REVISION=242792-1 -DUSE_LIBJPEG_TURBO=1 -DENABLE_ONE_CLICK_SIGNIN 
-DENABLE_PRE_SYNC_BACKUP -DENABLE_REMOTING=1 -DENABLE_WEBRTC=1 
-DENABLE_MEDIA_ROUTER=1 -DENABLE_PEPPER_CDMS -DENABLE_CONFIGURATION_POLICY 
-DENABLE_NOTIFICATIONS -DENABLE_HIDPI=1 
-DSYSTEM_NATIVELY_SIGNALS_MEMORY_PRESSURE -DDONT_EMBED_BUILD_METADATA 
-DENABLE_TASK_MANAGER=1 -DENABLE_EXTENSIONS=1 -DENABLE_PLUGIN_INSTALLATION=1 
-DENABLE_PLUGINS=1 -DENABLE_SESSION_SERVICE=1 -DENABLE_THEMES=1 
-DENABLE_AUTOFILL_DIALOG=1 -DENABLE_BACKGROUND=1 -DENABLE_GOOGLE_NOW=1 
-DCLD_VERSION=2 -DENABLE_PRINTING=1 -DENABLE_BASIC_PRINTING=1 
-DENABLE_PRINT_PREVIEW=1 -DENABLE_SPELLCHECK=1 -DUSE_PLATFORM_SPELLCHECKER=1 
-DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 -DENABLE_SETTINGS_APP=1 
-DENABLE_SUPERVISED_USERS=1 -DENABLE_SERVICE_DISCOVERY=1 
-DENABLE_WIFI_BOOTSTRAPPING=1 -DV8_USE_EXTERNAL_STARTUP_DATA 
-DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL 
-DSAFE_BROWSING_SERVICE -DHAVE_JPEG -DUSE_LIBPCI=1 -DUSE_OPENSSL=1 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DNDEBUG -DNVALGRIND 
-DDYNAMIC_ANNOTATIONS_ENABLED=0 -D_FORTIFY_SOURCE=2 -Igen -I../../include 
-I../.. -I../../chromium/src/third_party/libjpeg_turbo -isysroot 
/Applications/Xcode6.1.app/Contents/Developer/Platforms/MacOSX.platform/Develope
r/SDKs/MacOSX10.6.sdk -O2 -gdwarf-2 -fvisibility=hidden -Werror -Wnewline-eof 
-mmacosx-version-min=10.6 -arch i386 -Wall -Wendif-labels -Wextra 
-Wno-unused-parameter -Wno-missing-field-initializers 
-Wno-selector-type-mismatch -Wpartial-availability -Wheader-hygiene 
-Wno-char-subscripts -Wno-unneeded-internal-declaration 
-Wno-covered-switch-default -Wstring-conversion -Wno-c++11-narrowing 
-Wno-deprecated-register -Wno-inconsistent-missing-override 
-Wno-shift-negative-value -std=c++11 -fno-rtti -fno-exceptions 
-fvisibility-inlines-hidden -fno-threadsafe-statics -Xclang -load -Xclang 
/Volumes/data/b/build/slave/mac32/build/src/third_party/llvm-build/Release+Asser
ts/lib/libFindBadConstructs.dylib -Xclang -add-plugin -Xclang 
find-bad-constructs -Xclang -plugin-arg-find-bad-constructs -Xclang 
check-templates -fcolor-diagnostics -fno-strict-aliasing  -c 
../../source/row_gcc.cc -o obj/source/libyuv.row_gcc.o
../../source/row_gcc.cc:1667:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
../../source/row_gcc.cc:1695:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
../../source/row_gcc.cc:2085:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
../../source/row_gcc.cc:2118:5: error: inline assembly requires more registers 
than available
    "sub       %[u_buf],%[v_buf]               \n"
    ^
4 errors generated.
ninja: build stopped: subcommand failed.

Original comment by fbarch...@google.com on 25 Sep 2015 at 9:43

GoogleCodeExporter commented 9 years ago
C
I420AlphaToARGB_Any (5323 ms)
I420AlphaToARGB_Unaligned (5321 ms)
I420AlphaToARGB_Invert (5332 ms)
I420AlphaToARGB_Opt (5293 ms)
I420AlphaToARGB_Premult (7206 ms)

SSSE3
I420AlphaToARGB_Any (454 ms)
I420AlphaToARGB_Unaligned (425 ms)
I420AlphaToARGB_Invert (411 ms)
I420AlphaToARGB_Opt (416 ms)
I420AlphaToARGB_Premult (730 ms)

AVX2
I420AlphaToARGB_Any (377 ms)
I420AlphaToARGB_Unaligned (329 ms)
I420AlphaToARGB_Invert (324 ms)
I420AlphaToARGB_Opt (323 ms)
I420AlphaToARGB_Premult (483 ms)

Original comment by fbarch...@google.com on 25 Sep 2015 at 11:20

GoogleCodeExporter commented 9 years ago
Trying to compile this code for 32 bit x86 with gcc 4.8/4.9 we get
error: 'asm' operand has impossible constraints
for libyuv::I422AlphaToARGBRow_SSSE3.

I'm not certain what that really means but it seems to indicate that the 
assembler is running out registers. Do you have any suggestion on how to get 
around that?

Original comment by brat...@opera.com on 27 Oct 2015 at 2:17