KerwinMa / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Neon aligned memory #109

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Neon performance would benefit from aligned vld/vst.
e.g.
vld1.8 {q0},[r0,:128]
Down side is the user (e.g. webrtc) is responsible for providing memory and may 
pass unaligned memory.  This is usually unintentional, but sometimes there are 
legitimate cases for unaligned memory.
So 2 functions are needed for some functions - one for aligned and one for 
unaligned.  And if odd width performance matters, an 'any' version makes use of 
the unaligned version.

The first version will convert all Neon row functions to requiring aligned 
pointers.  The calling code will check alignment and fallback on C code for 
unaligned cases.
This benefits all data sizes, not just 128 bits.  Unlike SSE2 movdqa.

Original issue reported on code.google.com by fbarch...@google.com on 4 Oct 2012 at 1:09

GoogleCodeExporter commented 9 years ago
So far tests are showing aligned vld and vst don't help?

Original comment by fbarch...@google.com on 8 Oct 2012 at 4:00

GoogleCodeExporter commented 9 years ago
On a15 is this and pld are not a win, so closing as Wont Fix for now.
If a cpu/test case comes along it can be revisited.

Original comment by fbarch...@google.com on 10 Oct 2012 at 4:28