buliaoyin / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

NV12ToARGB in 1 step AVX2 #403

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Various YUV to ARGB are implemented as
1. AVX2
2. SSSE3
3. Multi-step SSSE3 called by C
4. C

On Neon all YUV formats are supported natively, plus several RGB outputs.
To achieve this on Intel
1. write macros to do READ for any YUV format
2. write C wrappers for multistep AVX2
3. implement YUV conversion via matrix to allow SSSE3/AVX2 instead of C for 
more formats

Original issue reported on code.google.com by fbarch...@google.com on 13 Feb 2015 at 10:00

GoogleCodeExporter commented 9 years ago
These functions have C wrappers for SSSE3

findstr void.*SSE row_common.cc
void I422ToRGB565Row_SSSE3(const uint8* src_y,
void I422ToARGB1555Row_SSSE3(const uint8* src_y,
void I422ToARGB4444Row_SSSE3(const uint8* src_y,
void NV12ToRGB565Row_SSSE3(const uint8* src_y,
void NV21ToRGB565Row_SSSE3(const uint8* src_y,
void YUY2ToARGBRow_SSSE3(const uint8* src_yuy2,
void UYVYToARGBRow_SSSE3(const uint8* src_uyvy,

They are candidates for
1. direct conversions - use macros to read and store.
2. AVX2 wrappers of similar nature.  port the functions they call to AVX2.

Original comment by fbarch...@google.com on 18 Feb 2015 at 6:47

GoogleCodeExporter commented 9 years ago
Conversion grid tests all direct path conversions:

Was
I420ToARGB_Opt (750 ms)
I420ToBGRA_Opt (781 ms)
I420ToRGBA_Opt (766 ms)
I420ToABGR_Opt (766 ms)
I420ToRGB24_Opt (1375 ms)
I420ToRAW_Opt (1375 ms)
I420ToRGB565_Opt (1844 ms)
I420ToARGB1555_Opt (2157 ms)
I420ToARGB4444_Opt (1750 ms)
I420ToRGB565_Opt (1844 ms)
I422ToARGB_Opt (750 ms)
I444ToARGB_Opt (985 ms)
I411ToARGB_Opt (1125 ms)
NV12ToARGB_Opt (1000 ms)
NV21ToARGB_Opt (1015 ms)
YUY2ToARGB_Opt (1469 ms)
UYVYToARGB_Opt (1469 ms)
J422ToARGB_Opt (10797 ms)
I400ToARGB_Opt (407 ms)
UYVYToARGB_Opt (1469 ms)
YToARGB_Opt (406 ms)
NV12ToRGB565_Opt (1969 ms)
NV21ToRGB565_Opt (1969 ms)

Now
I420ToARGB_Opt (735 ms)
I420ToBGRA_Opt (781 ms)
I420ToRGBA_Opt (750 ms)
I420ToABGR_Opt (750 ms)
I420ToRGB24_Opt (1391 ms)
I420ToRAW_Opt (1391 ms)
I420ToRGB565_Opt (1203 ms)
I420ToARGB1555_Opt (1312 ms)
I420ToARGB4444_Opt (1078 ms)
I420ToRGB565_Opt (1203 ms)
I422ToARGB_Opt (750 ms)
I444ToARGB_Opt (1000 ms)
I411ToARGB_Opt (1140 ms)
NV12ToARGB_Opt (703 ms)
NV21ToARGB_Opt (703 ms)
YUY2ToARGB_Opt (1000 ms)
UYVYToARGB_Opt (1000 ms)
J422ToARGB_Opt (10813 ms)
I400ToARGB_Opt (406 ms)
UYVYToARGB_Opt (1000 ms)
YToARGB_Opt (406 ms)
NV12ToRGB565_Opt (1157 ms)
NV21ToRGB565_Opt (1156 ms)

Original comment by fbarch...@google.com on 26 Feb 2015 at 8:49

GoogleCodeExporter commented 9 years ago
All functions ported to AVX2

I420ToARGB_Opt (750 ms)
I420ToBGRA_Opt (750 ms)
I420ToRGBA_Opt (750 ms)
I420ToABGR_Opt (750 ms)
I420ToRGB24_Opt (1156 ms)
I420ToRAW_Opt (1141 ms)
I420ToRGB565_Opt (1234 ms)
I420ToARGB1555_Opt (1328 ms)
I420ToARGB4444_Opt (1062 ms)
I420ToRGB565_Opt (1234 ms)
I422ToARGB_Opt (766 ms)
I444ToARGB_Opt (792 ms)
I411ToARGB_Opt (719 ms)
NV12ToARGB_Opt (719 ms)
NV21ToARGB_Opt (719 ms)
YUY2ToARGB_Opt (1016 ms)
UYVYToARGB_Opt (1007 ms)
I400ToARGB_Opt (443 ms)
NV12ToRGB565_Opt (1187 ms)
NV21ToRGB565_Opt (1188 ms)
ARGBToARGB_Opt (456 ms)
J422ToARGB_Opt (750 ms)

Some are 2 or 3 steps.  NV12ToARGB is 1 step AVX2.

Original comment by fbarch...@chromium.org on 17 Mar 2015 at 11:24