VREV for byte swap on Arm Neon?

gnuradio / volk

The Vector Optimized Library of Kernels

http://libvolk.org

GNU Lesser General Public License v3.0

537 stars 202 forks source link

VREV for byte swap on Arm Neon? #479

Open Triang3l opened 3 years ago

Triang3l commented 3 years ago

The Arm Neon versions of byte swaps (volk_*_byteswap.h) in VOLK use shifts/OR or lookup tables, somewhat similar to the x86 versions. However, Neon has a dedicated instruction for byte swaps — VREV, usable as vrev16q_u8 for 8-in-16, vrev32q_u8 for 8-in-32, and vrev64q_u8 for 8-in-64. Are there performance/compatibility reasons for not using it, or is that more of not knowing about the instruction when the code was written?

jdemel commented 3 years ago

Could you point to the exact code that you have in mind?

Are you refering to this?

https://github.com/gnuradio/volk/blob/237a6fc9242ea8c48d2bbd417a6ea14feaf7314a/kernels/volk/volk_16u_byteswap.h#L223-L239

This implementation is essentially 7 years old. Are you interested in contributing an optimized version of this code?

Triang3l commented 3 years ago

Yes, and the vtbl4/vqtbl1q implementations for 32-bit and 64-bit swaps. I can try setting up the environment on my phone and write direct vrev versions, and possibly run some speed comparisons, as well as tests, in the weekend.