almondyoung / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Port AVX2 to gcc #269

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
AVX2 code is Visual C only so far.
Will require a compiler version check.
For OSX, llvm 3.1 (XCode 4.5) is required.
For Linux, gcc 4.7 is required.
If lacking hardware, the intel emulator may be used.

Original issue reported on code.google.com by fbarch...@google.com on 13 Sep 2013 at 10:54

GoogleCodeExporter commented 8 years ago
r797 adds gcc 4.7 AVX2 Polynomial

Original comment by fbarch...@google.com on 24 Sep 2013 at 1:21

GoogleCodeExporter commented 8 years ago
ARGBShuffle ported to gcc.

Original comment by fbarch...@google.com on 1 Oct 2013 at 3:57

GoogleCodeExporter commented 8 years ago
r1035 makes a fix for AVX2 detection

Original comment by fbarch...@chromium.org on 15 Jul 2014 at 12:50

GoogleCodeExporter commented 8 years ago
r1122 changed #elif's to #endif/#if so AVX2 can be added.  But introduced a 
build error:

[66/267 | 7.503] LINK genmacro, POSTBUILDS
FAILED: /Volumes/data/b/build/goma/gomacc 
../../third_party/llvm-build/Release+Asserts/bin/clang++ -MMD -MF 
obj/source/libyuv.rotate.o.d -DV8_DEPRECATION_WARNINGS 
-D__ASSERT_MACROS_DEFINE_VERSIONS_WITHOUT_UNDERSCORE=0 -DCHROMIUM_BUILD 
-DCR_CLANG_REVISION=217949 -DCOMPONENT_BUILD -DUSE_LIBJPEG_TURBO=1 
-DENABLE_ONE_CLICK_SIGNIN -DENABLE_PRE_SYNC_BACKUP -DENABLE_REMOTING=1 
-DENABLE_WEBRTC=1 -DENABLE_PEPPER_CDMS -DENABLE_CONFIGURATION_POLICY 
-DENABLE_NOTIFICATIONS -DENABLE_HIDPI=1 
-DDISCARDABLE_MEMORY_ALWAYS_SUPPORTED_NATIVELY 
-DSYSTEM_NATIVELY_SIGNALS_MEMORY_PRESSURE -DDCHECK_ALWAYS_ON=1 
-DENABLE_EGLIMAGE=1 -DENABLE_TASK_MANAGER=1 -DENABLE_EXTENSIONS=1 
-DENABLE_PLUGIN_INSTALLATION=1 -DENABLE_PLUGINS=1 -DENABLE_SESSION_SERVICE=1 
-DENABLE_THEMES=1 -DENABLE_AUTOFILL_DIALOG=1 -DENABLE_BACKGROUND=1 
-DENABLE_GOOGLE_NOW=1 -DCLD_VERSION=2 -DCLD2_DATA_SOURCE=static 
-DENABLE_FULL_PRINTING=1 -DENABLE_PRINTING=1 -DENABLE_SPELLCHECK=1 
-DENABLE_CAPTIVE_PORTAL_DETECTION=1 -DENABLE_APP_LIST=1 -DENABLE_SETTINGS_APP=1 
-DENABLE_MANAGED_USERS=1 -DENABLE_SERVICE_DISCOVERY=1 
-DENABLE_WIFI_BOOTSTRAPPING=1 -DENABLE_LOAD_COMPLETION_HACKS=1 -DHAVE_JPEG 
-DUSE_OPENSSL=1 -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DWTF_USE_DYNAMIC_ANNOTATIONS=1 
-Igen -I../../include -I../.. -I../../chromium/src/third_party/libjpeg_turbo 
-isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/S
DKs/MacOSX10.6.sdk -O0 -fvisibility=hidden -Werror -Wnewline-eof 
-mmacosx-version-min=10.6 -arch i386 -Wendif-labels -Wno-unused-parameter 
-Wno-missing-field-initializers -Wno-selector-type-mismatch -Wheader-hygiene 
-Wno-char-subscripts -Wno-unneeded-internal-declaration 
-Wno-covered-switch-default -Wstring-conversion -Wno-c++11-narrowing 
-Wno-deprecated-register -Wno-unused-local-typedef -std=gnu++11 -fno-rtti 
-fno-exceptions -fvisibility-inlines-hidden -fno-threadsafe-statics -Xclang 
-load -Xclang 
/Volumes/data/b/build/slave/mac32/build/src/third_party/llvm-build/Release+Asser
ts/lib/libFindBadConstructs.dylib -Xclang -add-plugin -Xclang 
find-bad-constructs -fcolor-diagnostics -fno-strict-aliasing 
-fstack-protector-all -Wno-undefined-bool-conversion 
-Wno-tautological-undefined-compare  -c ../../source/rotate.cc -o 
obj/source/libyuv.rotate.o
../../source/rotate.cc:38:9: error: 'DECLARE_FUNCTION' macro redefined 
[-Werror,-Wmacro-redefined]
#define DECLARE_FUNCTION(name)                                                 \
        ^
../../source/rotate.cc:26:9: note: previous definition is here
#define DECLARE_FUNCTION(name)                                                 \
        ^
1 error generated.
ninja: build stopped: subcommand failed.

/Volumes/data/b/build/goma/goma_ctl.sh stat

Original comment by fbarch...@google.com on 16 Oct 2014 at 9:21

GoogleCodeExporter commented 8 years ago
r1127 ports I420ToBGRA to AVX2 for Windows.

Original comment by fbarch...@google.com on 20 Oct 2014 at 9:28

GoogleCodeExporter commented 8 years ago
r1131 ports I420ToBGRA to gcc
SSSE3 480.2 ms
AVX2 385.5 ms

Original comment by fbarch...@google.com on 21 Oct 2014 at 4:44

GoogleCodeExporter commented 8 years ago
Estimation of completeness.  NEON considered complete
NEON
findstr Row_NEON *.h | wc -l
86 functions

SSE2/SSSE3...
findstr Row_SS *.h | wc -l
90 functions.  Some are duplicate..sse2 + ssse3

AVX2
findstr Row_AVX2 *.h | wc -l
30

30/86 = 34.8%

findstr Row_AVX2.*\( *_win.cc  | wc -l
     28

findstr Row_AVX2.*\( *_posix.cc  | wc -l
      5

Original comment by fbarch...@google.com on 29 Oct 2014 at 6:46

GoogleCodeExporter commented 8 years ago
On OSX these are the slowest 'Opt' functions:
LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000 
out/Release/libyuv_unittest --gtest_filter=**Opt | sed 's/\(.*(\)\([0-9]*\)\( 
ms)\)/\2 - \1\2\3/g' | sort -rn | grep ms
3567 - [       OK ] libyuvTest.TestFixedDiv1_Opt (3567 ms)
3028 - [       OK ] libyuvTest.TestFixedDiv_Opt (3028 ms)
2429 - [       OK ] libyuvTest.ARGBBlur_Opt (2429 ms)
1630 - [       OK ] libyuvTest.BayerBGGRToI420_Opt (1630 ms)
1627 - [       OK ] libyuvTest.BayerGRBGToI420_Opt (1627 ms)
1592 - [       OK ] libyuvTest.BayerRGGBToI420_Opt (1592 ms)
1582 - [       OK ] libyuvTest.BayerGBRGToI420_Opt (1582 ms)
1510 - [       OK ] libyuvTest.ARGBBlurSmall_Opt (1510 ms)
1378 - [       OK ] libyuvTest.BayerGRBGToARGB_Opt (1378 ms)
1378 - [       OK ] libyuvTest.BayerBGGRToARGB_Opt (1378 ms)
1337 - [       OK ] libyuvTest.BayerGBRGToARGB_Opt (1337 ms)
1336 - [       OK ] libyuvTest.BayerRGGBToARGB_Opt (1336 ms)
1168 - [       OK ] libyuvTest.ARGBToI411_Opt (1168 ms)
1099 - [       OK ] libyuvTest.I420ToI444_Opt (1099 ms)
976 - [       OK ] libyuvTest.I420ToARGB1555_Opt (976 ms)
928 - [       OK ] libyuvTest.I420ToRGB565_Opt (928 ms)
888 - [       OK ] libyuvTest.NV12ToRGB565_Opt (888 ms)
886 - [       OK ] libyuvTest.NV21ToRGB565_Opt (886 ms)
873 - [       OK ] libyuvTest.ARGBSobel_Opt (873 ms)

Original comment by fbarch...@chromium.org on 10 Nov 2014 at 6:57

GoogleCodeExporter commented 8 years ago
The initial port is complete but not passing unittests.  Installing the Intel 
SDE emulator:

cd ia32
udo chgrp  procmod pinbin
chmod g+s pinbin 
cd ../intel64
udo chgrp  procmod pinbin
chmod g+s pinbin 

.../sde-external-7.8.0-2014-10-02-mac/sde -ast -hsw -- 
out/Release/libyuv_unittest 

[----------] Global test environment tear-down
[==========] 887 tests from 1 test case ran. (79889 ms total)
[  PASSED  ] 833 tests.
[  FAILED  ] 54 tests, listed below:
[  FAILED  ] libyuvTest.I420ToARGB_Any
[  FAILED  ] libyuvTest.I420ToARGB_Unaligned
[  FAILED  ] libyuvTest.I420ToARGB_Invert
[  FAILED  ] libyuvTest.I420ToARGB_Opt
[  FAILED  ] libyuvTest.I422ToARGB_Any
[  FAILED  ] libyuvTest.I422ToARGB_Unaligned
[  FAILED  ] libyuvTest.I422ToARGB_Invert
[  FAILED  ] libyuvTest.I422ToARGB_Opt
[  FAILED  ] libyuvTest.I420ToBayerBGGR_Any
[  FAILED  ] libyuvTest.I420ToBayerBGGR_Unaligned
[  FAILED  ] libyuvTest.I420ToBayerBGGR_Invert
[  FAILED  ] libyuvTest.I420ToBayerBGGR_Opt
[  FAILED  ] libyuvTest.I420ToBayerRGGB_Any
[  FAILED  ] libyuvTest.I420ToBayerRGGB_Unaligned
[  FAILED  ] libyuvTest.I420ToBayerRGGB_Invert
[  FAILED  ] libyuvTest.I420ToBayerRGGB_Opt
[  FAILED  ] libyuvTest.I420ToBayerGBRG_Any
[  FAILED  ] libyuvTest.I420ToBayerGBRG_Unaligned
[  FAILED  ] libyuvTest.I420ToBayerGBRG_Invert
[  FAILED  ] libyuvTest.I420ToBayerGBRG_Opt
[  FAILED  ] libyuvTest.I420ToBayerGRBG_Any
[  FAILED  ] libyuvTest.I420ToBayerGRBG_Unaligned
[  FAILED  ] libyuvTest.I420ToBayerGRBG_Invert
[  FAILED  ] libyuvTest.I420ToBayerGRBG_Opt
[  FAILED  ] libyuvTest.ARGBToI420_Any
[  FAILED  ] libyuvTest.ARGBToI420_Unaligned
[  FAILED  ] libyuvTest.ARGBToI420_Invert
[  FAILED  ] libyuvTest.ARGBToI420_Opt
[  FAILED  ] libyuvTest.ARGBToI411_Any
[  FAILED  ] libyuvTest.ARGBToI411_Unaligned
[  FAILED  ] libyuvTest.ARGBToI411_Invert
[  FAILED  ] libyuvTest.ARGBToI411_Opt
[  FAILED  ] libyuvTest.UYVYToI422_Any
[  FAILED  ] libyuvTest.UYVYToI422_Unaligned
[  FAILED  ] libyuvTest.UYVYToI422_Invert
[  FAILED  ] libyuvTest.UYVYToI422_Opt
[  FAILED  ] libyuvTest.ARGBToI400_Any
[  FAILED  ] libyuvTest.ARGBToI400_Unaligned
[  FAILED  ] libyuvTest.ARGBToI400_Invert
[  FAILED  ] libyuvTest.ARGBToI400_Opt
[  FAILED  ] libyuvTest.ARGBToI400_Random
[  FAILED  ] libyuvTest.ARGBToJ400_Any
[  FAILED  ] libyuvTest.ARGBToJ400_Unaligned
[  FAILED  ] libyuvTest.ARGBToJ400_Invert
[  FAILED  ] libyuvTest.ARGBToJ400_Opt
[  FAILED  ] libyuvTest.ARGBToJ400_Random
[  FAILED  ] libyuvTest.ARGBToARGBMirror_Any
[  FAILED  ] libyuvTest.ARGBToARGBMirror_Unaligned
[  FAILED  ] libyuvTest.ARGBToARGBMirror_Invert
[  FAILED  ] libyuvTest.ARGBToARGBMirror_Opt
[  FAILED  ] libyuvTest.ARGBToARGBMirror_Random
[  FAILED  ] libyuvTest.TestARGBMirror
[  FAILED  ] libyuvTest.ARGBRotate180
[  FAILED  ] libyuvTest.ARGBRotate180_Odd

54 FAILED TESTS
  YOU HAVE 1 DISABLED TEST

Original comment by phthor...@gmail.com on 12 Dec 2014 at 5:54

GoogleCodeExporter commented 8 years ago
r1195 disables the affected AVX2 functions.  All tests pass

Original comment by fbarch...@google.com on 12 Dec 2014 at 7:32

GoogleCodeExporter commented 8 years ago
Fixed in r1207
All Windows functions are ported to GCC / NaCL.

Original comment by fbarch...@google.com on 17 Dec 2014 at 12:08

GoogleCodeExporter commented 8 years ago

Original comment by fbarch...@google.com on 17 Dec 2014 at 12:08