RoaringBitmap / roaring

Roaring bitmaps in Go (golang), used by InfluxDB, Bleve, DataDog
http://roaringbitmap.org/
Apache License 2.0
2.49k stars 226 forks source link

Use vector instructions for union2by2 #288

Open lemire opened 3 years ago

lemire commented 3 years ago

We have ARM assembly for union2by2 as per https://github.com/RoaringBitmap/roaring/pull/287

It should be said that there are vectorized algorithm (hint: ARM NEON) for union routines. It appears in the C code (for SSE only but it is portable).

https://github.com/RoaringBitmap/CRoaring/blob/master/src/array_util.c#L1569-L1647

It is described in section 4.3 of the following paper:

It is actually not difficult to implement. For someone that is either fluent in ARM NEON or wants to learn... it is a great project.

jacksonrnewhouse commented 3 years ago

https://github.com/DLTcollab/sse2neon seems like a valuable reference

lemire commented 3 years ago

If someone is interested, I am fluent in the various SIMD dialects. Finding the intrinsics and the instructions is not so difficult. There are few of them to find anyhow.

The hard part is to put it all together.

jacksonrnewhouse commented 3 years ago

Unfortunately golang hasn't ported over UMIN and UMAX. I opened an issue on it, but will probably have to figure out the bitwise representation for now.

jacksonrnewhouse commented 3 years ago

@lemire, the intersection algorithms use _mm_cmpestrm to compare 2 registers of 8 shorts each, and there isn't an equivalent instruction in arm64, I believe. Have any suggestions? My thoughts are to do something like sse_merge, where you run CMEQ between the registers, then rotate 8 times using EXT, and pop_count the result, shifting right by 4 bits (each equal case sets 16 bits). Any suggestions on other approaches?

lemire commented 3 years ago

Though cmpestrm looks really good, it is an expensive instruction with many cycles of latency so the approach you suggest is more competitive than it appears.