Open vkarak opened 8 years ago
I am sorry, I should have responded sooner.
The optimized code was generated to specifically target the intel compiler, because it was the only compiler that could reliably vectorize the loops without intrinsics. Hence why I have Intel-only features.
If we use intrinsics you can either
__declspec
was used to try and give the compiler extra information.Since we're moving to intrinsics then, I'm not gonna spend more on this.
I would be tempted to step back, and use the unoptimized kernels as a starting point.
This turns out to be more general. GCC dislikes
__declspec
s. I'm on it.