easybuilders / easybuild-easyblocks

Collection of easyblocks that implement support for building and installing software with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
106 stars 285 forks source link

enhance custom easyblock for GCC to use `with-arch` option for nvptx with 13.1+ #3396

Closed Thyre closed 2 months ago

Thyre commented 3 months ago

Motivation

GCC, like other compilers, allows users to use offloading via OpenMP & OpenACC for example to utilize accelerators in their written programs. While some compilers require the presence of CUDA for this e.g. Clang, GCC has no requirement for it to simply build and run an executable containing offloading code.

By default, GCC targets a very low architecture for NVIDIA GPUs though. In GCC 12.3.0, this was sm_30. In GCC 13.3.0, the default version is still the same, but recent nvptx-tools can bump this to sm_50 when CUDA is detected. With this, GCC can work around the removal of sm_3x in more recent CUDA versions, avoiding the following error message:

GCC 12.3.0

$ gcc -fopenmp -foffload=nvptx-none test.c
ptxas fatal   : Value 'sm_35' is not defined for option 'gpu-name'
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: /p/software/fs/jurecadc/stages/2024/software/GCCcore/12.3.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.3.0//accel/nvptx-none/mkoffload returned 1 exit status
compilation terminated.
/p/software/jurecadc/stages/2024/software/binutils/2.40-GCCcore-12.3.0/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

GCC 13.3.0

$ gcc --verbose -fopenmp -foffload=nvptx-none test.c
[...]
/p/software/fs/jurecadc/stages/2025/software/GCCcore/13.3.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.3.0/accel/nvptx-none/lto1 -quiet -dumpbase ./a.xnvptx-none.mkoffload -m64 -mgomp -misa=sm_30 -version -fno-openacc -fno-pie -fcf-protection=none -foffload-abi=lp64 -fopenmp @/tmp/ccZrKG87 -o /tmp/ccbOTp8K.s
[...]
Verifying sm_30 code with sm_50 code generation.
 ptxas -c -o /dev/null /tmp/cc7PheNR.o --gpu-name sm_50 -O0
[...]

However, this may break once again as soon as NVIDIA decides to remove the already deprecated support for sm_50 (in CUDA 11.0). Fortunately, GCC has added a configure option to overwrite the default nvptx architecture. Beginning with GCC 13.1.0, one can pass --with-arch=sm_[x] to set the default option, as long as GCC can understand it.

In addition, choosing a newer architecture by default might bring performance improvements and access to additional features.

Scope of this PR

This pull request adds the new option --with-arch=sm_[x] to GCC builds starting with GCC 13.1.0 if offloading support via nvptx is enabled. To choose which architecture is being passed, a new function named map_nvptx_capability is implemented. This function retrieves cuda_compute_capabilities and matches them against the official GCC mappings (which can be found in ${GCC_SRC}/gcc/config/nvptx/nvptx.opt) being used for the -march-map= argument.

Since GCC only allows to set a single default architecture, I decided to use the lowest one available. For example, JURECA-DC sets both 7.5 and 8.0 for EasyBuild. Therefore, 7.5 would be chosen. If parsing the architecture mappings fails, for example because the file layout changed or the file was moved, a warning is returned. In this case, we stick to the default of GCC. This is also the case if the architectures in cuda_compute_capabilities cannot be mapped at all. This makes the additions more resilient to upstream changes.

Generally, this helps users as they are not required to pass architectures manually every single time as it is the case with CUDA 12 + GCC 12.3.0 right now. Here, one would need to pass -foffload-options=-misa=sm_80.

SebastianAchilles commented 3 months ago

Test report by @sebastianachilles

Overview of tested easyconfigs (in order)

Build succeeded for 5 out of 5 (5 easyconfigs in total) jscclxc1.int.jsc-clx.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, Intel Xeon Processor (Cascadelake) (cascadelake), Python 3.9.18 See https://gist.github.com/SebastianAchilles/6890d9cc1f0024fe7541e064ba5009f8 for a full test report.

Thyre commented 3 months ago

Thanks a lot for the review. I agree with your comments and am working on adding them to the PR.

Thyre commented 3 months ago

Fixed the failed test workflow: https://github.com/easybuilders/easybuild-easyblocks/actions/runs/10196832799 I missed one f-string.

SebastianAchilles commented 3 months ago

Test report by @sebastianachilles

Overview of tested easyconfigs (in order)

Build succeeded for 5 out of 5 (5 easyconfigs in total) jscclxc1.int.jsc-clx.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, Intel Xeon Processor (Cascadelake) (cascadelake), Python 3.9.18 See https://gist.github.com/SebastianAchilles/2b79825a144baa828025421933035a68 for a full test report.

boegel commented 2 months ago

@boegelbot please test @ jsc-zen3 EB_ARGS="GCCcore-10.2.0.eb GCCcore-12.3.0.eb GCCcore-14.2.0.eb --installpath /tmp/$USER/pr-3396"

boegelbot commented 2 months ago

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3396 EB_ARGS="GCCcore-10.2.0.eb GCCcore-12.3.0.eb GCCcore-14.2.0.eb --installpath /tmp/$USER/pr-3396" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3396 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

Test results coming soon (I hope)...

*- notification for comment with ID 2341789066 processed* *Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).*
boegel commented 2 months ago

Test report by @boegel

Overview of tested easyconfigs (in order)

Build succeeded for 5 out of 5 (3 easyconfigs in total) node3900.accelgor.os - Linux RHEL 8.8, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA A100-SXM4-80GB, 545.23.08, Python 3.6.8 See https://gist.github.com/boegel/1dcd0f4c7656396fcdacc42dfa4f04f7 for a full test report.

boegelbot commented 2 months ago

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 3 out of 3 (3 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18 See https://gist.github.com/boegelbot/c33373ba82e48b22ebec6b3a5aa2dc71 for a full test report.