G-071 / octotiger-spack

2 stars 4 forks source link

BLAST test does not get disables on NVIDIA Grace #7

Closed diehlpk closed 5 months ago

diehlpk commented 5 months ago

This check fails on NVIDIA Grace Hopper https://github.com/G-071/octotiger-spack/blob/a7d1dc31f33fa578003c9c1e2fb91f03f85c8358/packages/octotiger/package.py#L255.

And we should check for aarch64 as well.

G-071 commented 5 months ago

Overall, the Octo-Tiger Spack package should now work on Grace for GPU builds without SVE and in turn for SVE builds without GPU support (enabling both does currently not work because we need std::experimental::simd for SVE and the scalar std::experimental::simd types do not compile on GPU for now! This makes combined builds with SVE on host and GPU support enabled not possible without investing extra work fixing this first).

Details to reproduce my single-node, non-SVE, GPU-accelerated build on Grace Hopper:

Versions:

Instructions to create a dev-build (assuming one is located within a octotiger src repository with the correct commit and all submodules checked out):

> spack dev-build --fresh --until cmake --drop-in bash --test=root  octotiger@master simd_library=KOKKOS simd_extension=SCALAR build_type==RelWithDebInfo +kokkos +cuda cuda_arch=90 %gcc@11 ^hpx +generic_coroutines  generator=make max_cpu_count=144 malloc=mimalloc networking=none ^silo~mpi ^c
ppuddle@0.3.1 number_buffer_buckets=144 ^cuda@12.3.107 ^cmake@3.26~ownlibs~ncurses ^curl@8
> cd spack-build
> make -j32
> ctest

Concretized spack spec that was used:

 > spack spec  octotiger@master simd_library=KOKKOS simd_extension=SCALAR build_type==RelWithDebInfo +kokkos +cuda cuda_arch=90 %gcc@11 ^hpx +generic_coroutines  generator=make max_cpu_count=144 malloc=mimalloc networking=none ^silo~mpi ^cppuddle@0.3.1 number_buffer_buckets=144 ^cuda@12.3.107 ^cmake@3.26~ownlibs~ncurses ^curl@8

Input spec
--------------------------------
 -   octotiger@master%gcc@11+cuda+kokkos build_type==RelWithDebInfo cuda_arch=90 simd_extension=SCALAR simd_library=KOKKOS
 -       ^cmake@3.26~ncurses~ownlibs
 -       ^cppuddle@0.3.1 number_buffer_buckets=144
 -       ^cuda@12.3.107
 -       ^curl@8
 -       ^hpx+generic_coroutines generator=make malloc=mimalloc max_cpu_count=144 networking=none
 -       ^silo~mpi

Concretized
--------------------------------
 -   octotiger@master%gcc@11.4.1~boost_multiprecision+cuda~fast_fp_contract~ipo+kokkos~kokkos_hpx_kernels~rocm~sycl build_system=cmake build_type=RelWithDebInfo cuda_arch=90 cxxstd=17 generator=make griddim=8 hydro_host_tasks=1 monopole_host_tasks=1 multipole_host_tasks=1 simd_extension=SCALAR simd_library=KOKKOS theta_minimum=0.34 arch=linux-rhel9-neoverse
_n1
[+]      ^boost@1.84.0%gcc@11.4.1+atomic+chrono~clanglibcpp~container+context~contract~coroutine+date_time~debug+exception~fiber+filesystem+graph~graph_parallel~icu+iostreams~json+locale+log+math~mpi+multithreaded~nowide~numpy~pic+program_options~python+random+regex+serialization+shared+signals~singlethreaded~stacktrace+system~taggedlayout+test+thread+timer
~type_erasure~versionedlayout+wave build_system=generic context-impl=fcontext cxxstd=17 patches=a440f96 visibility=hidden arch=linux-rhel9-neoverse_n1
[+]          ^bzip2@1.0.8%gcc@11.4.1~debug~pic+shared build_system=generic arch=linux-rhel9-neoverse_n1
[e]              ^diffutils@3.8%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^xz@5.4.1%gcc@11.4.1~pic build_system=autotools libs=shared,static arch=linux-rhel9-neoverse_n1
[+]          ^zlib-ng@2.1.5%gcc@11.4.1+compat+opt build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^zstd@1.5.5%gcc@11.4.1~programs build_system=makefile libs=shared,static arch=linux-rhel9-neoverse_n1
[+]      ^cmake@3.26.6%gcc@11.4.1~doc~ncurses~ownlibs build_system=generic build_type=RelWithDebInfo arch=linux-rhel9-neoverse_n1
[+]          ^curl@8.4.0%gcc@11.4.1~gssapi~ldap~libidn2~librtmp~libssh~libssh2+nghttp2 build_system=autotools libs=shared,static tls=mbedtls arch=linux-rhel9-neoverse_n1
[+]              ^mbedtls@2.28.2%gcc@11.4.1+pic build_system=makefile build_type=RelWithDebInfo libs=static arch=linux-rhel9-neoverse_n1
[+]              ^nghttp2@1.57.0%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^expat@2.5.0%gcc@11.4.1+libbsd build_system=autotools arch=linux-rhel9-neoverse_n1
[+]              ^libbsd@0.11.7%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]                  ^libmd@1.0.4%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^jsoncpp@1.9.5%gcc@11.4.1~strip build_system=meson buildtype=release default_library=shared arch=linux-rhel9-neoverse_n1
[+]              ^meson@1.2.2%gcc@11.4.1 build_system=python_pip patches=0f0b1bd,ae59765 arch=linux-rhel9-neoverse_n1
[+]                  ^py-pip@23.1.2%gcc@11.4.1 build_system=generic arch=linux-rhel9-neoverse_n1
[+]                  ^py-setuptools@68.0.0%gcc@11.4.1 build_system=generic arch=linux-rhel9-neoverse_n1
[+]                  ^py-wheel@0.41.2%gcc@11.4.1 build_system=generic arch=linux-rhel9-neoverse_n1
[+]              ^ninja@1.11.1%gcc@11.4.1+re2c build_system=generic arch=linux-rhel9-neoverse_n1
[+]                  ^re2c@2.2%gcc@11.4.1 build_system=generic arch=linux-rhel9-neoverse_n1
[+]          ^libarchive@3.7.1%gcc@11.4.1+iconv build_system=autotools compression=bz2lib,lz4,lzma,lzo2,zlib,zstd crypto=mbedtls libs=shared,static programs=none xar=expat arch=linux-rhel9-neoverse_n1
[+]              ^libiconv@1.17%gcc@11.4.1 build_system=autotools libs=shared,static arch=linux-rhel9-neoverse_n1
[+]              ^lz4@1.9.4%gcc@11.4.1+pic build_system=makefile libs=shared,static arch=linux-rhel9-neoverse_n1
[+]              ^lzo@2.10%gcc@11.4.1 build_system=autotools libs=shared,static arch=linux-rhel9-neoverse_n1
[+]          ^libuv@1.46.0%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^rhash@1.4.2%gcc@11.4.1 build_system=makefile patches=093518c,3fbfe46 arch=linux-rhel9-neoverse_n1
[+]      ^cppuddle@0.3.1%gcc@11.4.1~allocator_counters+buffer_content_recycling+buffer_recycling~enable_gpu_tests+executor_recycling+hpx~ipo build_system=cmake build_type=RelWithDebInfo generator=make max_number_gpus=1 number_buffer_buckets=144 arch=linux-rhel9-neoverse_n1
[e]      ^cuda@12.3.107%gcc@11.4.1~allow-unsupported-compilers~dev build_system=generic arch=linux-rhel9-neoverse_n1
[+]      ^gcc-runtime@11.4.1%gcc@11.4.1 build_system=generic arch=linux-rhel9-neoverse_n1
[e]      ^gmake@4.3%gcc@11.4.1~guile build_system=generic patches=599f134 arch=linux-rhel9-neoverse_n1
[+]      ^hdf5@1.14.3%gcc@11.4.1~cxx~fortran+hl~ipo~java~map~mpi+shared+szip+threadsafe+tools api=default build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel9-neoverse_n1
[+]          ^libaec@1.0.6%gcc@11.4.1~ipo+shared build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel9-neoverse_n1
[e]          ^pkgconf@1.4.2%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]      ^hpx@1.9.1%gcc@11.4.1+async_cuda+async_gpu_futures~async_mpi+cuda~examples+generic_coroutines~ipo~lci_pp_log~lci_pp_pcounter~rocm~sycl~tools build_system=cmake build_type=RelWithDebInfo cuda_arch=90 cxxstd=17 generator=make instrumentation=none malloc=mimalloc max_cpu_count=144 networking=none sycl_target_arch=none arch=linux-rhel9-neoverse_n1
[+]          ^asio@1.28.0%gcc@11.4.1~boost_coroutine~boost_regex~separate_compilation build_system=autotools cxxstd=17 arch=linux-rhel9-neoverse_n1
[e]          ^git@2.39.3%gcc@11.4.1+man+nls+perl+subtree~svn~tcltk build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^hwloc@2.9.1%gcc@11.4.1~cairo~cuda~gl~libudev+libxml2~netloc~nvml~oneapi-level-zero~opencl+pci~rocm build_system=autotools libs=shared,static arch=linux-rhel9-neoverse_n1
[+]              ^libpciaccess@0.17%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]                  ^util-macros@1.19.3%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]              ^libxml2@2.10.3%gcc@11.4.1+pic~python+shared build_system=autotools arch=linux-rhel9-neoverse_n1
[+]              ^ncurses@6.4%gcc@11.4.1~symlinks+termlib abi=none build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^mimalloc@2.1.2%gcc@11.4.1~build_tests~debug_full~ipo~local_dynamic_tls+override+padding~secure~see_asm~show_errors~skip_collect_on_exit~use_cxx~xmalloc build_system=cmake build_type=RelWithDebInfo generator=make libs=object,shared,static arch=linux-rhel9-neoverse_n1
[e]          ^python@3.10.12%gcc@11.4.1+bz2+crypt+ctypes+dbm~debug+libxml2+lzma+nis~optimizations+pic+pyexpat~pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=0d98e93,7d40923,ebdca64,f2fd060 arch=linux-rhel9-neoverse_n1
[+]      ^hpx-kokkos@0.4.0%gcc@11.4.1+cuda~ipo~rocm~sycl build_system=cmake build_type=RelWithDebInfo cuda_arch=90 cxxstd=17 future_type=polling generator=make arch=linux-rhel9-neoverse_n1
[+]      ^kokkos@4.0.01%gcc@11.4.1+aggressive_vectorization~compiler_warnings+cuda~cuda_constexpr+cuda_lambda~cuda_ldg_intrinsic~cuda_relocatable_device_code~cuda_uvm~debug~debug_bounds_check~debug_dualview_modify_check~deprecated_code~examples+hpx+hpx_async_dispatch~hwloc~ipo~memkind~numactl~openmp~openmptarget~pic~rocm+serial+shared~sycl~tests~threads~tun
ing+wrapper build_system=cmake build_type=RelWithDebInfo cuda_arch=90 cxxstd=17 generator=make intel_gpu_arch=none patches=b26a011 use_unsupported_sycl_arch=none arch=linux-rhel9-neoverse_n1
[+]          ^kokkos-nvcc-wrapper@4.0.01%gcc@11.4.1 build_system=generic patches=b475d96 arch=linux-rhel9-neoverse_n1
[+]      ^silo@4.11%gcc@11.4.1+fortran+fpzip+hdf5+hzip~mpi+pic+shared~silex build_system=autotools patches=251244d,451c4c5,a081263,eb2a3a0,fa050e0 arch=linux-rhel9-neoverse_n1
[e]          ^autoconf@2.71%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^autoconf-archive@2023.02.20%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[e]          ^automake@1.16.5%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[+]          ^gnuconfig@2022-09-17%gcc@11.4.1 build_system=generic arch=linux-rhel9-neoverse_n1
[e]          ^libtool@2.4.6%gcc@11.4.1 build_system=autotools arch=linux-rhel9-neoverse_n1
[e]          ^m4@1.4.18%gcc@11.4.1+sigsegv build_system=autotools patches=3877ab5,fc9b616 arch=linux-rhel9-neoverse_n1
[e]          ^perl@5.34.0%gcc@11.4.1~cpanm+opcode+open+shared+threads build_system=generic arch=linux-rhel9-neoverse_n1
[+]          ^readline@8.2%gcc@11.4.1 build_system=autotools patches=bbf97f1 arch=linux-rhel9-neoverse_n1
[+]      ^vc@1.4.1%gcc@11.4.1~ipo build_system=cmake build_type=RelWithDebInfo generator=make arch=linux-rhel9-neoverse_n1

@diehlpk Can you try to create a build according to the instructions above? That should resolve all problems you reported so far and seems to work fine on the Grace Hopper machine I tested it on: All tests pass and the performance seems reasonable on first sight. In case it does not work, please post the error and the concretized spack spec.

If required, I can also post the spec and steps to build the SVE CPU-only build for Grace Grace machines.

diehlpk commented 5 months ago

I can confirm it built on Nvidia Grace Hopper for me.