Closed Thyre closed 2 months ago
Test report by @sebastianachilles
Build succeeded for 5 out of 5 (5 easyconfigs in total) jscclxc1.int.jsc-clx.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, Intel Xeon Processor (Cascadelake) (cascadelake), Python 3.9.18 See https://gist.github.com/SebastianAchilles/6890d9cc1f0024fe7541e064ba5009f8 for a full test report.
Thanks a lot for the review. I agree with your comments and am working on adding them to the PR.
Fixed the failed test workflow: https://github.com/easybuilders/easybuild-easyblocks/actions/runs/10196832799 I missed one f-string.
Test report by @sebastianachilles
Build succeeded for 5 out of 5 (5 easyconfigs in total) jscclxc1.int.jsc-clx.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, Intel Xeon Processor (Cascadelake) (cascadelake), Python 3.9.18 See https://gist.github.com/SebastianAchilles/2b79825a144baa828025421933035a68 for a full test report.
@boegelbot please test @ jsc-zen3 EB_ARGS="GCCcore-10.2.0.eb GCCcore-12.3.0.eb GCCcore-14.2.0.eb --installpath /tmp/$USER/pr-3396"
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de
PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3396 EB_ARGS="GCCcore-10.2.0.eb GCCcore-12.3.0.eb GCCcore-14.2.0.eb --installpath /tmp/$USER/pr-3396" EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3396 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh
' executed!
Submitted batch job 4839
Test results coming soon (I hope)...
Test report by @boegel
Build succeeded for 5 out of 5 (3 easyconfigs in total) node3900.accelgor.os - Linux RHEL 8.8, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA A100-SXM4-80GB, 545.23.08, Python 3.6.8 See https://gist.github.com/boegel/1dcd0f4c7656396fcdacc42dfa4f04f7 for a full test report.
Test report by @boegelbot
Build succeeded for 3 out of 3 (3 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18 See https://gist.github.com/boegelbot/c33373ba82e48b22ebec6b3a5aa2dc71 for a full test report.
Motivation
GCC, like other compilers, allows users to use offloading via OpenMP & OpenACC for example to utilize accelerators in their written programs. While some compilers require the presence of CUDA for this e.g. Clang, GCC has no requirement for it to simply build and run an executable containing offloading code.
By default, GCC targets a very low architecture for NVIDIA GPUs though. In GCC 12.3.0, this was
sm_30
. In GCC 13.3.0, the default version is still the same, but recentnvptx-tools
can bump this tosm_50
when CUDA is detected. With this, GCC can work around the removal ofsm_3x
in more recent CUDA versions, avoiding the following error message:GCC 12.3.0
GCC 13.3.0
However, this may break once again as soon as NVIDIA decides to remove the already deprecated support for
sm_50
(in CUDA 11.0). Fortunately, GCC has added a configure option to overwrite the default nvptx architecture. Beginning with GCC 13.1.0, one can pass--with-arch=sm_[x]
to set the default option, as long as GCC can understand it.In addition, choosing a newer architecture by default might bring performance improvements and access to additional features.
Scope of this PR
This pull request adds the new option
--with-arch=sm_[x]
to GCC builds starting with GCC 13.1.0 if offloading support via nvptx is enabled. To choose which architecture is being passed, a new function namedmap_nvptx_capability
is implemented. This function retrievescuda_compute_capabilities
and matches them against the official GCC mappings (which can be found in${GCC_SRC}/gcc/config/nvptx/nvptx.opt
) being used for the-march-map=
argument.Since GCC only allows to set a single default architecture, I decided to use the lowest one available. For example, JURECA-DC sets both 7.5 and 8.0 for EasyBuild. Therefore, 7.5 would be chosen. If parsing the architecture mappings fails, for example because the file layout changed or the file was moved, a warning is returned. In this case, we stick to the default of GCC. This is also the case if the architectures in
cuda_compute_capabilities
cannot be mapped at all. This makes the additions more resilient to upstream changes.Generally, this helps users as they are not required to pass architectures manually every single time as it is the case with CUDA 12 + GCC 12.3.0 right now. Here, one would need to pass
-foffload-options=-misa=sm_80
.