biotrump / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add 4:2:0 YUVA to ARGB & ABGR conversion for SkCanvasVideoRenderer. #473

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
For this function:
https://code.google.com/p/chromium/codesearch#chromium/src/media/blink/skcanvas_
video_renderer.cc&l=581

Could just do YV12 -> ARGB/ABGR and copy and attenuate alpha.

Original issue reported on code.google.com by dalecurtis@chromium.org on 24 Jul 2015 at 9:47

GoogleCodeExporter commented 9 years ago
There's an SkCanvasVideoRenderer unittest in media_blink_unittests

Original comment by dalecurtis@chromium.org on 24 Jul 2015 at 10:00

GoogleCodeExporter commented 9 years ago
Theres 3 ways to implement this:

1. Caller can do 2 steps: I420ToARGB and ARGBCopyYToAlpha   

ARGBCopyYToAlpha takes a plane of data and copies it to the 4th byte of ARGB.  
See planarfunctions.h
LIBYUV_API
int ARGBCopyYToAlpha(const uint8* src_y, int src_stride_y,
                     uint8* dst_argb, int dst_stride_argb,
                     int width, int height);

Advantages:
a. Allows any form of I420 conversion, including I422, J420 and ABGR 
destination.  b. works with existing libyuv.
c. both functions are highly optimized including AVX2.
Disadvantage:
a. less cache/memory friendly for large images where ARGB destination doesnt 
fit cache.

2. Implement A420ToARGB (I420 with alpha), internally doing 2 steps per row.
Advantages:
a. faster - using a row buffer for intermediate ARGB is cache friendly.
b. abstracts implementation, which can be improved in future.
Disadvantage:
a. implements a specific color space.  less flexible.

3. Implement optimized A420ToARGB.
Internally the I420ToARGB is done with 3 macros to implement the I420 fetch, 
YUV conversion, and ARGB storing.  The ARGB storing fills in 255.  This macro 
could implement a variation that fetches alpha from another pointer.
Advantage: fastest
Disadvantage: most complex, least flexible.

Original comment by fbarch...@chromium.org on 11 Aug 2015 at 6:03

GoogleCodeExporter commented 9 years ago
I420AlphaToARGB implemented in r1466

I420AlphaToARGB_Any (1373 ms)
I420AlphaToARGB_Unaligned (1625 ms)
I420AlphaToARGB_Invert (1303 ms)
I420AlphaToARGB_Opt (1302 ms)

I420ToARGB_Any (660 ms)
I420ToARGB_Unaligned (637 ms)
I420ToARGB_Invert (615 ms)
I420ToARGB_Opt (542 ms)

Original comment by fbarch...@chromium.org on 18 Aug 2015 at 6:25

GoogleCodeExporter commented 9 years ago
change is integrated into chrome.

a followup improvement would be I420AlphaToABGR and some performance 
improvements.

Original comment by fbarch...@chromium.org on 20 Aug 2015 at 11:32

GoogleCodeExporter commented 9 years ago
This is the CL that switches to libyuv::I420AlphaToARGB
https://codereview.chromium.org/1293293003/

Original comment by fbarch...@chromium.org on 21 Aug 2015 at 12:49

GoogleCodeExporter commented 9 years ago
Starting ABGR version.

Original comment by fbarch...@chromium.org on 21 Aug 2015 at 1:07

GoogleCodeExporter commented 9 years ago
Performance went from 2 steps (for android)
SkCanvasVideoRendererTest.TransparentFrame (1610 ms)
to 1 step:
SkCanvasVideoRendererTest.TransparentFrame (1256 ms)

Original comment by fbarch...@chromium.org on 27 Aug 2015 at 1:25

GoogleCodeExporter commented 9 years ago
ABGR integrated into skcanvas.

consider merging ARGB and ABGR functions into a single function and exposing 
other color spaces - BGRA, J420 etc.

consider removing premultiplication and have renderer do unattenuated alpha 
blend.

consider implmenting I420AlphaToARGB internally as a single assembly function, 
which fills in alpha as it goes, instead of filling in 255 and then doing 2nd 
step to copy, and/or third step to attenuate RGB by alpha.

Original comment by fbarch...@chromium.org on 27 Aug 2015 at 1:40

GoogleCodeExporter commented 9 years ago
3 followup changes under consideration

1. refactor I420AlphaToABGR to use I420AlphaToARGB internally, but swap U and V 
and transpose conversion matrix, so I420AlphaToARGB and I420AlphaToABGR share 
low level code.
2. Port to ARMv7 Neon and aarch64 Neon
3. Now that row function is similar to old mmx code, introduce benchmark in 
chromium media.

Original comment by fbarch...@chromium.org on 27 Oct 2015 at 5:55