Closed Micket closed 1 year ago
I think the former favours putting CUDA in the toolchain (Foo-1.2.3-nvompic-22.7.eb
, with nvompic-22.7-CUDA-11.7.0.eb
), the latter favours putting it in Foo-1.2.3-nvompi-22.7-CUDA-11.7.0.eb
package?
* Do either of the above considerations matter?
I think you are correct, though i think it's not that bad either way for downstream; if you have a good template to follow, just --try-amend and it should be really easy to switch anyway. Plus, you'd have to consider stuff from GCCcore, UCX-CUDA, UCC-CUDA, NCCL regardless, which diminishes the differences even more.
We discussed this during the zoom meeting and the conclusions were
nvoff
(with some risk of this name being misconstrued, so we're open to suggestions. I personally found nvoflf
to be worse) I plan to make a pr with essentially:
NVHPC-22.7-CUDA-11.7.0.eb
nvompi-22.7-CUDA-11.7.0.eb
(also depends on UCX-CUDA, UCC-CUDA, maybe NCCL(?) directly, because we can)nvoff-22.7-CUDA-11.7.0.eb
Thomas asked @SebastianAchilles what they did for BLAS in their toolchains:
Currently we still reply on imlk on the CPUs. And cuBLAS , cuFFT and cuSOLVER on GPUs.
But I think with FlexiBLAS we should have the option of using any CPU backend, so that's just better i think (I still haven't tried switching BLAS backend with flexiblas, so i'm not sure how that actually works)
I would also be interested in defining a NVHPC based toolchain :+1:
Just for reference these are the previous attempts to create the nvompic
toolchain for the 2020b and 2021a toolchain generation:
Currently at JSC we are using the toolchain nvompic
. I added this toolchain definition a while ago to upstream: https://github.com/easybuilders/easybuild-framework/pull/3735 nvompi
was added later as well: https://github.com/easybuilders/easybuild-framework/pull/3969 However in my opinion it doesn't make sense to have both definitions.
The reason for adding the the c
suffix (which stands for CUDA
) is the following: you can either use an external CUDA for NVHPC or the CUDA shipped with NVHPC. So, no matter if we call it nvompi
or nvompic
CUDA will always be pulled in either directly or indirectly.
Personally I would prefer nvompic
to make point out the CUDA dependency.
For the top level toolchain NVHPC+OpenMPI+FlexiBlas+FFWT I would like to add to nvoff
and nvoflf
another name suggestion: nvofbf
. This is mainly to be consistent with the toolchain names we already have gofbf
(which is not used anymore, because from 2021a on it is called foss
) and gfbf
toolchain (GCC + FlexiBlas + FFTW): https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/g/gfbf/gfbf-2022a.eb
I got stuck on this because there were several things that didn't wanna build easily OpenMPI, BLIS has issues. @bartoldeman suggested some patches he wrote for CC, but i haven't had time to test anything out yet.
I have no strong opinion of it's called "nvompic" or "nvompi". The motivation for the latter was mostly to make it similar to gompi
, i.e. no c
suffix, since it will have a -CUDA-xx.x.x
suffix anyway.
Buuuuut that's before i realized how things would pan out (been a long time since we had compilers with a suffix), so that changes the situation somewhat. Since the compiler would have the actual versionsusfix;
NVHPC-22.7-CUDA-11.7.0.eb
then the nvompi(c)
(and everything using this NVHPC toolchain) would have the toolchain version specified as '22.7-CUDA-11.7.0'
; they wouldn't have any versionsuffix themselves, it would still just be nvompi(c)-2022a.eb
.
So.. nvompi, nvompic, meh. Sure Also, nvofbf, sure i guess.
On a related note: i recently found that CUDA bundles nvcc. Considering we exclude all the bundled stuff it kinda makes me wonder if we even need NVHPC itself for anything..?
I've also had a second thought when encountering yet more stuff to fix for the ever broken nvcc compiler;
Is nvcc any good at building CPU code in general? It certainly doesn't seem well tested and nvidia themselves doesn't seem to bother to use nvcc
to build their own bundled OpenMPI, which is quite telling.
I haven't done any benchmarking, but I wouldn't expect it to matter the slighest bit if OpenMPI is compiled with nvcc or GCC here, and I wouldn't expect nvcc to be especially good at building OpenBLAS.
So, spending time patching a build of OpenMPI or OpenBLAS or whatever just for the sake of using CC=nvcc
only to produce a possibly slower version would just be counterproductive.
I basically just want to build e.g. VASP with nvcc. I don't care the slighest bit about whether all the deps and builddeps also insisted on using nvcc to build.
Perhaps
toolchainopts = {'use_gcc': True}
could both speed things up and avoid annoying patching due to the limitations of nvcc.
Maybe i'm wrong, perhaps nvcc is fricking amazing at CPU stuff as well, or maybe i should just stick to foss/2022a and use the nvcc compiler bundled with CUDA to build my ~2 top level applications that use it.
My opinion on this is to just use nvompi
and nvofbf
, and don't use nvompic
, ignoring the fact that NVHPC already ships with a CUDA -- sure users can make use of it, but easyblocks often make assumptions based on consistency, e.g. they use get_software_root('CUDA')
etc, so it simplifies life a lot if things built with EasyBuild use an EB CUDA module.
Now I did install an NVHPC 22.7 locally yesterday. Here are some notes:
configopts += ' CC=pgcc CXX=pgc++ FC=pgfortran'
(will edit later why exactly this is still needed)local_extra_flags = "-D__ELF__"
toolchainopts = {'pic': True, 'extra_cflags': local_extra_flags, 'extra_fflags': local_extra_flags}
and
'configopts': '-DABI=Intel',
for the FlexiBLAS component (since nvfortran, like ifort, uses the g77 api for returning complex numbers in functions). Then there is also the issue that the lib is called libflexiblas_intel.so
, so framework may need an adjustment for scalapack (or need to create a symlink libflexiblas.so -> libflexiblas_intel.so in postinstallcmds)
nvc
compiler flags.
using preconfigopts = "sed -i 's/LINKER.*/LINKER := nvc/' common.mk &&"
and configopts += '--complex-return=intel CC=gcc CFLAGS="-O2 -ftree-vectorize -march=native -fno-math-errno"
make it compile but it's ugly, a patch to BLIS would be better (hopefully not too hard).found what causes some issues with Open MPI and also FFTW. This comes from libtool
not supporting nvc
.
for CC=nvc
it sets
lt_prog_compiler_pic=' -fPIC -DPIC'
lt_prog_compiler_static=''
but for CC=pgcc
it sets
lt_prog_compiler_pic='-fpic'
lt_prog_compiler_static='-Bstatic'
this caused some strange errors linking shared libraries.
Hi there. The need for MPI/OpenACC use is definitively there and getting the NVHPC toolchains in place would be awesome. What is remaining here? Just compiling with NVHPC and not using CUDA certainly has use cases. If CUDA should be automatically installed or available, when loading say a NVHPC toolchain I do not know, but maybe not to keep it as slim as possible.
What is remaining here?
Someone needs to do it.
If CUDA should be automatically installed or available, when loading say a NVHPC toolchain I do not know, but maybe not to keep it as slim as possible.
That's not a reason we actually care about. We need to keep a separate CUDA package becuase
@bartoldeman did you (or someone else) mention at a previous conference call that there's something that needs to be taken into account regarding the various openmp libraries? Particularly if using gcc
as the C compiler instead of nvc
. Or were you reporting above that it's possible to build everything with nvc
(just that some things require pretending that it's pgc
)?
Thanks for the update.
Someone needs to do it.
Maybe we can contribute. I will check.
That's not a reason we actually care about. We need to keep a separate CUDA package becuase
Good, so can we then consider the CUDA side of this settled then? Since we anyway need CUDA as a separate package and that there are certainly use cases of using the nvhpc
compiler without CUDA, users would have to specify, meaning we also make nvompic
etc. where these CUDA packages are included?
I dusted off my old easyconfigs, adding all the stuff @bartoldeman mentioned into #16724
We have a toolchain now (edit: cfr. #16724)
Perhaps everyone already knows this and I'm just slow, but it's not clear to me how we want to deal with NVHPC, nvompi, nvompic.
Some questions I have are
NVHPC
and thennvompi
and only useFoo-1.2.3-nvompi-22.7-CUDA-11.7.0.eb
?In my mind these toolchains aren't like GCC, there is no compatibility with CUDA to worry about, and anyone using it definitely wants to use CUDA as well, so there just isn't any real need to offer an "opt-in". I would have just done
NVHPC
with a particular system CUDA, and then just build OpenMPI (depending on UCX-CUDA right away) and call thatnvompic
. Nonvompi
.