finalfusion / finalfrontier

Context-sensitive word embeddings with subwords. In Rust.
https://finalfusion.github.io/finalfrontier
Other
87 stars 4 forks source link

vec_simd: dynamically select SSE or AVX code path #112

Closed danieldk closed 4 years ago

danieldk commented 4 years ago

I have made this a draft PR, because I haven't checked the performance impact at all. From reading stdarch it seems that feature detection is cached, so we wouldn't be using the expensive cpuid instruction on every call. But I want to profile and inspect the assembly a bit.

If it works out, dynamic feature detection is really nice: we don't have to explicitly compile for e.g. AVX anymore, but compile without any extra features and AVX would be used if the CPU is capable.

danieldk commented 4 years ago

There is another interesting problem, the SIMD intrinsics are not inlined (apparently this is intentional to avoid inlining in such a way that speculative execution can run such instructions on CPUs that do not support it). I'll change it to force inlining.

danieldk commented 4 years ago

Seems on par with the old implementation now. However, inspecting the instructions I recall the compiler doing unrolling of the loops, which does not seem to happen now.