Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
We discovered that ROCm 5.7.1 and higher hang during multithreaded Geant4 runs. The problem appears to be a regression in the async memory allocation that results in a race condition, or possibly a bug in thrust: we've seen some cases where a kernel launch on one thread and an async malloc/free on another cause the app to lock up.
We discovered that ROCm 5.7.1 and higher hang during multithreaded Geant4 runs. The problem appears to be a regression in the async memory allocation that results in a race condition, or possibly a bug in thrust: we've seen some cases where a kernel launch on one thread and an async malloc/free on another cause the app to lock up.
TODO: fill this in from OLCF help tickets