Closed PHILO-HE closed 4 days ago
Thanks for opening a pull request!
Could you open an issue for this pull request on Github Issues?
https://github.com/apache/incubator-gluten/issues
Then could you also rename commit message and pull request title in the following format?
[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}
See also:
@zhouyuan
Let's add -mno-avx512f -mbmi2 to compile flag, align with Velox.
Currently, we're passing -march=native
, this doesn't seem to respect -mno-avx512f
Using -march=native
ends up generating ASM code with vmovdqu8
instructions when CPU supports AVX512, and makes it incompatible with older CPUs, forcing the build machine and run machine to be of same type. I explicitly passed -mno-avx512f -mbmi2
but it didn't do anything and AVX512 instructions were still present in the ASM code. Any idea why it doesn't respect -mno-avx512f
?
Using
-march=native
ends up generating ASM code withvmovdqu8
instructions when CPU supports AVX512, and makes it incompatible with older CPUs, forcing the build machine and run machine to be of same type. I explicitly passed-mno-avx512f -mbmi2
but it didn't do anything and AVX512 instructions were still present in the ASM code. Any idea why it doesn't respect-mno-avx512f
?
@surnaik, -mno-avx512f
only disables the subset of avx512 instructions. I recommend you to remove the use of -march=native
in cross compilation and set correct target cpu for compiler. For example, you can set -march=haswell
if haswell is used on your side.
If your cluster has diverse cpu architectures, you can just use generic setting for compiler. Alternatively, to make the binary fully optimized for newer cpu architecture, I think you have to customize gluten build with different -march
set for different cpu architectures, instead of compiling once for universal use.
@surnaik,
-mno-avx512f
only disables the subset of avx512 instructions. I recommend you to remove the use of-march=native
in cross compilation and set correct target cpu for compiler. For example, you can set-march=haswell
if haswell is used on your side. If your cluster has diverse cpu architectures, you can just use generic setting for compiler. Alternatively, to make the binary fully optimized for newer cpu architecture, I think you have to customize gluten build with different-march
set for different cpu architectures, instead of compiling once for universal use.
-mno-avx512f should take effect after -march=native.
What changes were proposed in this pull request?
These are some legacy code inherited from Gazelle. And it's not the correct place to do this setting for compiler if we want it for compiling all native code.