Gamaru / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

ARGBToRGB565 neon use vsri #571

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
aarch64 version uses sri instruction to shift and mask channels together, 
saving instructions.
backport this to neon 32 bit.

Original issue reported on code.google.com by fbarch...@google.com on 24 Feb 2016 at 3:00

GoogleCodeExporter commented 8 years ago
This tutorial covers this particular function
https://community.arm.com/groups/processors/blog/2010/09/01/coding-for-neon--par
t-4-shifting-left-and-right

This is a port of the row_neon64.cc version of this function.
So the differences between 32 and 64 bit are minor.

This is the existing 64 bit code:
#define ARGBTORGB565                                      
    "shll       v0.8h,  v22.8b, #8             \n"  /* R  
    "shll       v21.8h, v21.8b, #8             \n"  /* G  
    "shll       v20.8h, v20.8b, #8             \n"  /* B  
    "sri        v0.8h,  v21.8h, #5             \n"  /* RG 
    "sri        v0.8h,  v20.8h, #11            \n"  /* RGB

This is the 32 bit port:
#define ARGBTORGB565                                        
    "vshll.u8    q0, d22, #8                   \n"  /* R    
    "vshll.u8    q8, d21, #8                   \n"  /* G    
    "vshll.u8    q9, d20, #8                   \n"  /* B    
    "vsri.16     q0, q8, #5                    \n"  /* RG   
    "vsri.16     q0, q9, #11                   \n"  /* RGB  

vsri shifts a register right by an immediate and inserts it into the 
destination.
e.g.
q0 rrrr_rrrr_0000_0000
q8 gggg_gggg_0000_0000

vsri.16     q0, q8, #5
shifts q8 (g) by 5
q8 0000 0ggg_gggg_g000_0000
then masks in 5 bits from q0
q0 rrrr_rggg_gggg_g000_0000

vsri.16     q0, q9, #11
then takes B
q9 bbbb_bbbb_0000_0000
shifts down by 11
q9 0000_0000_000b_bbbb
and masks in 11 bits from q0 with q9
q0 rrrr_rggg_gggb_bbbb

If 4444 were done the same way, it would be 7 instructions, same as it is now.

Now
#define ARGBTOARGB4444                                      
    "vshr.u8    d20, d20, #4                   \n"  /* B    
    "vbic.32    d21, d21, d4                   \n"  /* G    
    "vshr.u8    d22, d22, #4                   \n"  /* R    
    "vbic.32    d23, d23, d4                   \n"  /* A    
    "vorr       d0, d20, d21                   \n"  /* BG   
    "vorr       d1, d22, d23                   \n"  /* RA   
    "vzip.u8    d0, d1                         \n"  /* BGRA 
if done with vsri
#define ARGBTOARGB4444                                      
    "vshll.u8    q0, d23, #8                   \n"  /* A    
    "vshll.u8    q8, d22, #8                   \n"  /* R    
    "vshll.u8    q9, d21, #8                   \n"  /* G    
    "vshll.u8    q10, d20, #8                  \n"  /* B    
    "vsri.16     q0, q8, #4                    \n"  /* AR   
    "vsri.16     q0, q9, #8                    \n"  /* ARG  
    "vsri.16     q0, q10, #12                  \n"  /* ARGB 

but could be done on 8 bit values
#define ARGBTOARGB4444                                       
    "vsri.8      d23, d22, #4                  \n"  /* AR    
    "vsri.8      d21, d20, #4                  \n"  /* GB    
    "vzip.u8     d21, d23                      \n"  /* ARGB  
    "vmov        d0, d21                       \n"           
    "vmov        d1, d23                       \n"  

Original comment by fbarch...@google.com on 25 Feb 2016 at 1:25

GoogleCodeExporter commented 8 years ago
The following revision refers to this bug:
  https://chromium.googlesource.com/libyuv/libyuv.git/+/ee99b85126aeafe64ba3da8f28aafcac80a595ac

commit ee99b85126aeafe64ba3da8f28aafcac80a595ac
Author: Frank Barchard <fbarchard@google.com>
Date: Mon Feb 29 20:22:25 2016

Port ARGBToRGB565 from aarch64 neon to 32 bit

The 64 bit version of ARGBToRGB565 to 32 bit. 64 bit is using sri which shifts 
and inserts, saving some masking.  The instruction is available for neon 32 bit 
as well.

R=magjed@chromium.org, harryjin@google.com
BUG=libyuv:571

Review URL: https://codereview.chromium.org/1724393002 .

[modify] 
https://crrev.com/ee99b85126aeafe64ba3da8f28aafcac80a595ac/README.chromium
[modify] 
https://crrev.com/ee99b85126aeafe64ba3da8f28aafcac80a595ac/include/libyuv/versio
n.h
[modify] 
https://crrev.com/ee99b85126aeafe64ba3da8f28aafcac80a595ac/source/row_neon.cc
[modify] 
https://crrev.com/ee99b85126aeafe64ba3da8f28aafcac80a595ac/source/row_neon64.cc

Original comment by bugdroid1@chromium.org on 29 Feb 2016 at 8:22

GoogleCodeExporter commented 8 years ago

Original comment by fbarch...@google.com on 29 Feb 2016 at 8:31