Add an interface for switching between different AdePT backends. In this way, the AsyncAdePT implementation can share scoring and integration code with example1.
Split the transport from the integration-related parts by creating a
new library.
Create a transport abstraction, so AdePTTrackingManager is independent
of the transport implementation.
Add thread and event IDs to the transport interface. These are
necessary for the async transport implementation.
Start to enumerate tracks in the tracking manager. This can be used to
reproducibly seed the AdePT random sequences.
Add some const declarations for the default AdePT implementation.
Use a factory funciton to instantiate AdePT. Like this, different
AdePT implementations can be used without changing code in the tracking
manager or in AdePTPhysics.
Replace a few includes with forward declarations.
Fix device link errors that can show when using a symbol in multiple
cuda translation units.
Refactor the processing of hits.
Instead of processing hits by passing a pointer/reference to a HostScoring instance, a loop over iterators to hits is used. In this way, hit scoring is decoupled from the specific implementation of HostScoring, and all classes with the same interface as the original GPUHit can be used for scoring. This is done to facilitate hit scoring in the AsyncExample.
Move Geant4 objects into the .cpp to make the integration headers simpler.
Place temporary scoring objects into a struct to go around G4's pool allocators. This prevents a destruction order fiasco (where the pool is gone but the object isn't), and keeps the scoring objects closer in memory. Two objects need to leak, unfortunately, since they are allocated in G4 pools, and the handles don't support them being on a stack.
Improve const correctness in a few places.
Add information about threadID and eventID to the scoring template. This information is required for AsyncAdePT to score correctly, but is unused in example1 for now.
These two changes don't seem to impact the run times. Here is a diff of the full sorted output of example 1 with 8 ttbar events on 4 threads (including diffing the energy depositions):
@@ -5104,7 +5092,7 @@
proton: 7 eKin (GeV): 1720.36 (total) 245.765 (avg)
Reading cms2018_sd.gdml transiently on CPU for Geant4 ...
References : NIM A 506 (2003), 250-303
-Run time: 159.599
+Run time: 160.097
sigma+: 1 eKin (GeV): 12.1624 (total) 12.1624 (avg)
sigma-: 1 eKin (GeV): 21.225 (total) 21.225 (avg)
sigma-: 2 eKin (GeV): 28.236 (total) 14.118 (avg)
These two changes don't seem to impact the run times. Here is a diff of the full sorted output of example 1 with 8 ttbar events on 4 threads (including diffing the energy depositions):