More SIMD implementations

Bodigrim / bitvec

Bit vectors: 8x less memory, up to 3500x faster than Vector Bool

https://hackage.haskell.org/package/bitvec

BSD 3-Clause "New" or "Revised" License

73 stars 7 forks source link

More SIMD implementations #66

Open konsumlamm opened 1 year ago

konsumlamm commented 1 year ago

See also #64.

[ ] make operations work for bit vectors with offset
[x] support more operations
- [x] bitIndex (#81)
- [x] nthBitIndex (#81)
- [x] selectBits (#82)
- [x] excludeBits (#82)
- [x] reverseBits (#71)
[ ] support mutable variants

konsumlamm commented 1 year ago

I did some experiments on SIMD implementations for the other operations. However, I wasn't able to find any implementation that can be vectorized by OpenMP, so I resorted to using SIMD intrinsics (unfortunately, this is a lot harder to get right). For most operations, I wrote SSE and AVX versions. Here are my findings (the factors are speedups over the standard implementation):

bitIndex: about the same up to 1024 bits, then up to 0.6x (SSE) and 0.4x (AVX) nthBitIndex should be similar
reverseBits: up to 0.25x (SSE) and 0.15x (AVX)
selectBits: up to 0.36x (when using -mbmi2) and 0.06 (AVX)

Are you fine with using intrinsics? If so, should there be separate flags for SSE and AVX implementations (both are x86-specific, but quite common)?

Bodigrim commented 1 year ago

I'm fine with intrinsics, but not with additional flags.

SSE2 is always available on any x86_64 CPU.
Higher intrinsics require a runtime dispatch by __get_cpuid_count flags. Here are some examples:
- https://github.com/haskell/text/blob/master/cbits/measure_off.c
- https://github.com/haskell/bytestring/blob/master/cbits/is-valid-utf8.c

Bodigrim commented 1 year ago

@konsumlamm I'd like to make a release soon so that consumers could benefit from your work here. Shall I go ahead as is or do you have plans to work on excludeBits / selectBits soon?

konsumlamm commented 1 year ago

I have an implementation of selectBits/excludeBits lying around, that I could make a PR for, but only for the immutable versions. I don't have much time currently, so I can't work on the other things rn.

Bodigrim commented 1 year ago

I'll gladly take immutable versions only.