Lokathor / wide

A crate to help you go wide. By which I mean use SIMD stuff.
https://docs.rs/wide
zlib License
251 stars 22 forks source link

add reduce min/max along with tests. Also optimize i16x8 abs for sse2 #138

Closed mcroomp closed 10 months ago

mcroomp commented 10 months ago

Also for 256 bit reductions, first reduce the top and bottom halves in parallel for generating better code. Use wrapping_add instead of sum to avoid blowups in debug builds since wrapping operations are assumed everywhere in this library.