celeritas-project / celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
https://celeritas-project.github.io/celeritas/
Other
65 stars 35 forks source link

Clear along-step action when track is marked as errored #1377

Closed amandalund closed 3 months ago

amandalund commented 3 months ago

With cms2018 we were occasionally hitting an assertion failure:

{
  "condition": "AppliesValid{}(sim) == static_cast<bool>(sim.along_step_action())",
  "context": {
    "dir": [
      -0.05499329494344731,
      4.7255830867030864e-05,
      0.9984867226348828
    ],
    "energy": [
      0.15909047775592167,
      "MeV"
    ],
    "event": 6,
    "label": "along-step-neutral",
    "num_steps": 25,
    "parent": 15187800,
    "particle": 1,
    "pos": [
      -34.997497048433964,
      -96.15483293113718,
      1131.1294534893607
    ],
    "thread": 3673,
    "track": 15189690,
    "track_slot": 3673,
    "type": "KernelContextException",
    "volume": 4070
  },
  "file": "/home/alund/celeritas_project/celeritas/src/celeritas/global/detail/TrackExecutorImpl.hh",
  "line": 64,
  "type": "DebugError",
  "which": "precondition failed"
}

Turns out this happens in the unusual case a track gets stuck and marked as errored in the PropagationApplier. Because the track still has a valid along-step action ID but has an invalid status, the assertion in IsAlongStepActionEqual fails. This change simply clears the along step action when a track is marked as errored.

amandalund commented 3 months ago

Will do! I think we'll also need to prevent errored tracks from applying eloss, right? It looks like in this failure the stuck track is marked as errored in the propagation applier but then still has energy loss applied, loses all its energy, and has its "errored" status changed to "killed".

amandalund commented 3 months ago

Should we also exclude errored tracks from the time/track updaters?

sethrj commented 3 months ago

@amandalund Nice find! The TrackUpdater already skips most of the work if the track is "alive", but if we're going to skip the entire along-step (time/step updates and all) for a particle that errors during pre-step, then it would be more consistent to skip the updates if it fails along the step. If we are skipping the update, maybe we can also kill this condition: https://github.com/celeritas-project/celeritas/blob/75d6f608a2acb156d6fdfda233003afc8c996af2/src/celeritas/global/alongstep/detail/TrackUpdater.hh#L42-L43

amandalund commented 3 months ago

Updated the updaters, but I think we'll need to keep that condition since looping tracks aren't "errored".