2753536587 / libyuv

Automatically exported from code.google.com/p/libyuv
0 stars 0 forks source link

I420ToRGB24 optimize #116

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Currently (r394) I420ToRGB24 is a 2 step conversion.
1. Each row is converted to ARGB with SSSE3/NEON
2. Each row of ARGB is converted to RGB24.

On Neon RGB24/RAW are easy to output - do as one step.
On SSSE3, the ARGBToRGB24 is unrolled to 48 RGB bytes for alignment and doesn't 
mesh as well.

Original issue reported on code.google.com by fbarch...@google.com on 9 Oct 2012 at 12:30

GoogleCodeExporter commented 9 years ago
r396 does 1 step I420ToRGB24 and I420ToRAW, optimized for Neon, but not SSSE3.

Original comment by fbarch...@google.com on 9 Oct 2012 at 12:30

GoogleCodeExporter commented 9 years ago
r399 ports Neon back to SSSE3 for Windows:
I420ToRGB24_OptVsC (932 ms)
I420ToRAW_OptVsC (921 ms)

Original comment by fbarch...@google.com on 9 Oct 2012 at 5:34

GoogleCodeExporter commented 9 years ago
r400 optimimized RGBA on linux/mac but not RGB24,RAW
1 pass SSSE3:
I420ToARGB_OptVsC (960 ms)
I420ToBGRA_OptVsC (947 ms)
I420ToABGR_OptVsC (974 ms)
I420ToRGBA_OptVsC (949 ms)
1 pass C (was 2 pass SSSE3):
I420ToRGB24_OptVsC (12750 ms)
I420ToRAW_OptVsC (12842 ms)
2 pass SSSE3:
I420ToRGB565_OptVsC (1453 ms)
I420ToARGB1555_OptVsC (1592 ms)
I420ToARGB4444_OptVsC (1285 ms)

Original comment by fbarch...@google.com on 9 Oct 2012 at 9:15

GoogleCodeExporter commented 9 years ago
Fixed r402
1 pass SSSE3:
I420ToARGB_Opt (967 ms)
I420ToBGRA_Opt (935 ms)
I420ToABGR_Opt (950 ms)
I420ToRGBA_Opt (933 ms)
I420ToRAW_Opt (1170 ms)
I420ToRGB24_Opt (1167 ms)
I422ToARGB_Opt (1011 ms)
I422ToBGRA_Opt (1011 ms)
I422ToABGR_Opt (1020 ms)
I422ToRGBA_Opt (1010 ms)
I411ToARGB_Opt (1025 ms)
I444ToARGB_Opt (1299 ms)
I400ToARGB_Opt (457 ms)
NV12ToARGB_Opt (885 ms)
NV21ToARGB_Opt (870 ms)
M420ToARGB_Opt (956 ms)

2 pass SSSE3:
I420ToRGB565_Opt (1444 ms)
I420ToARGB1555_Opt (1582 ms)
I420ToARGB4444_Opt (1277 ms)
I420ToBayerBGGR_Opt (1284 ms)
I420ToBayerRGGB_Opt (1283 ms)
I420ToBayerGBRG_Opt (1283 ms)
I420ToBayerGRBG_Opt (1284 ms)
NV12ToRGB565_Opt (1382 ms)
NV21ToRGB565_Opt (1383 ms)
YUY2ToARGB_Opt (1356 ms)

Original comment by fbarch...@chromium.org on 11 Oct 2012 at 1:23