celeritas-project / celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
https://celeritas-project.github.io/celeritas/user/index.html
Other
58 stars 32 forks source link

Update Frontier installation #1208

Closed sethrj closed 2 months ago

sethrj commented 3 months ago

This updates the build on Frontier to use the new hep143 allocation and installation with ROCm 5.7.1.

The only weird thing was that somehow thrust now assumes that it's building CUDA when we build from clang (and include it via device_runtime_api.h):

In file included from /ccs/home/s3j/Code/celeritas-frontier/src/corecel/sys/Device.cc:21:
In file included from /ccs/home/s3j/Code/celeritas-frontier/src/corecel/device_runtime_api.h:28:
In file included from /opt/rocm-5.7.1/include/thrust/mr/memory_resource.h:25:
In file included from /opt/rocm-5.7.1/include/thrust/detail/config/memory_resource.h:22:
In file included from /opt/rocm-5.7.1/include/thrust/detail/alignment.h:24:
/opt/rocm-5.7.1/include/thrust/detail/type_traits.h:31:10: fatal error: 'cuda/std/type_traits' file not found
#include <cuda/std/type_traits>
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
esseivaju commented 2 months ago

Are you using clang directly or hipcc? Looking at rocThrust, compiler.h and device_system.h, if __hip__ isn't defined then it's picking cuda. Wouldn't you have to also define __THRUST_DEVICE_SYSTEM_NAMESPACE

sethrj commented 2 months ago

@esseivaju This was happening through the .cc files compiled by clang++. Thrust was setting THRUST_DEVICE_COMPILER to THRUST_DEVICE_COMPILER_CLANG, and then defaulting THRUST_DEVICE_SYSTEM to THRUST_DEVICE_SYSTEM_CUDA. By overriding THRUST_DEVICE_SYSTEM in device_runtime_api.h we give thrust the correct "device system" , and then it will automatically set __THRUST_DEVICE_SYSTEM_NAMESPACE.

The change is only to provide Thrust more information when going into device_system.h, not to replace that header.

sethrj commented 2 months ago

OLCF recommends using their wacky Cray compiler wrappers... and those guys forward to llvm directly apparently