eic / EICrecon

EIC Reconstruction - JANA based
https://eic.github.io/EICrecon
GNU Lesser General Public License v3.0
6 stars 27 forks source link

Enabling ONNX (InclusiveKinematicsML) is unstable #1394

Closed wdconinc closed 1 month ago

wdconinc commented 4 months ago

Environment: (where does this bug occur, have you tried other environments)

Steps to reproduce: (give a step by step account of how to trigger the bug)

  1. Wait for some physics_benchmarks to fail, randomly.

Expected Result: (what do you expect when you execute the steps above)

All physics benchmarks should complete.

Actual Result: (what do you get when you execute the steps above)

EICrecon hangs at the end of a complete run, even when no errors or warnings were emitted. This appears as if the ort session is not able to close. This only seems to happen on eicweb, not been observed in GitHub pipelines (which run on smaller number of events). It happens to different jobs on different runs so it does not appear connected to the actual events.

simonge commented 2 months ago

I see this maybe 50% of the time just running over 100 events in eic-shell on my machine. All I am doing it loading an ort session and never even using it.

simonge commented 2 months ago

Adding an explicit destructor to the algorithm seems to fix the issue e.g.

  ~ChargeSharingDigitizationML() {
    m_session->release();
  }

Although needing this at all implies there is some more fundamental problem.