Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
In an earlier iteration of Celeritas we pushed all physics "interactions" to a single vector (one per track) and then applied them all simultaneously. We saw slightly worse performance, but easier logic, when we changed the code so that the InteractionApplier updates the track.
With @esseivaju 's async allocators I think we should consider revisiting this by asynchronously allocating space for secondaries and interactions between the pre-post and post steps, having a post-post kernel update all the tracks with their interaction at once, and deallocate the buffers after after. This would also slightly improve the logic in the PreStepExecutor, which requires launching on all threads to reset the secondary initializer count. I think it should also improve kernel occupancy (and reduce code size) for the model kernels.
In an earlier iteration of Celeritas we pushed all physics "interactions" to a single vector (one per track) and then applied them all simultaneously. We saw slightly worse performance, but easier logic, when we changed the code so that the
InteractionApplier
updates the track.With @esseivaju 's async allocators I think we should consider revisiting this by asynchronously allocating space for secondaries and interactions between the
pre-post
andpost
steps, having apost-post
kernel update all the tracks with their interaction at once, and deallocate the buffers after after. This would also slightly improve the logic in thePreStepExecutor
, which requires launching on all threads to reset the secondary initializer count. I think it should also improve kernel occupancy (and reduce code size) for the model kernels.