Closed jonas-schievink closed 6 years ago
I have confirmed the inline needs to be there. What happens is if the generic form of the function does not get inlined into the function with the target_feature attribute, then it doesn't get the target_feature applied. So for instance the AVX2 intrinsics get downgraded by the compiler to SSE2 equivalents.
If you have doubts/questions let me know.
Interesting. How exactly did you confirm? With or without optimizations? LLVM should definitely inline the call as it's the only one for each specific instantiation of distance
.
one way is just to yank it off, run a benchmark, and see it get ~5x slower with optimizations on full. You could build up a test case in https://godbolt.org/ as well and see the assembler. Now that I think about it the test I was doing was with a 'pub' function, maybe if it were private it would tend to inline it.
For now I'm going to merge this and put the inline back, but we can investigate further.
target_feature
attributes work independently of inline attributes, and#[inline(always)]
just forces LLVM to inline calls even in debug mode.Also it would be great if the example would explain why all the functions are marked as
unsafe
.