Closed phi-gamma closed 9 months ago
internally wide
uses safe_arch
and compile time configuration to pick what to do. I don't think that interacts well with how the multiversion crate works, since the avx versioning will still see the sse cfg only and pick sse functions.
A bit disappointing but it makes sense, thanks! Back to writing unsafe code :/
If you use the nightly core simd api then it should behave more like you're wanting.
Yeah switching to nightly would solve most of my problems but unfortunately that isn’t an option for my usecase.
I wonder if there’s a way to make Wide dispatch depending
on is_x86_feature_detected!
.
The basic problem is that wide
has tons and tons of "small" functions, and having each of them do a branch adds up very quickly.
I'm happy to add new types and/or new methods if you come up with anything solid though.
Dispatch with multiversion
is done once, afterwards
it only costs one relaxed load. But yeah that’s gonna
be tricky to reconcile with the static type dispatch
in wide
.
After slapping
multiversion
attributes on functions that usewide
types I expected the dispatched versions to use AVX2 like they do withpacked_simd
. That’s not the case however in my experiments. LLVM still generates slightly better code than in the default version (withv*pd
instructions) but doesn’t use any 256 bit registers. :/ Canpacked_simd
’s behavior be achieved here in stable code?Example:
Codegen difference
wide
vs.packed_simd
: