LDMX-Software / Ecal

Software module for digitizing, reconstructing, and vetoing within the ECal.
1 stars 0 forks source link

Some verbosity dependent seg faults in Ecal processors #40

Open bryngemark opened 2 years ago

bryngemark commented 2 years ago

On and off, there are terminate called after throwing an instance of 'std::bad_alloc'. I've seen this trying to run the ecal veto and most recently when trying to run v3.0.0 re-digi of EcalSimHits in files produced with v2.3.0. It doesn't happen on all files, but maybe 75% of them in my tests so far.

The weird thing is, that if the terminal printout verbosity increases (going from default p.termLogLevel = 2 to 0 or even 1, the problem disappears. But only if there is a log file specified too, with log level 0 or 1 (at 2, crashes reappear).

To reproduce: run the ecal digi parts of this template config on e.g. this input file: /nfs/slac/g/ldmx/data/mc20/v12/4.0GeV/v2.3.0-batch24/mc_v12-4GeV-1e-ecal_photonuclear_run230005_t1608608718.root with the pro_v3.0.0 singularity image: /nfs/slac/g/ldmx/production/singularityImages/ldmx-pro_v3.0.0-gLDMX.10.2.3_v0.4-r6.22.00-onnx1.3.0-xerces3.2.3-ubuntu18.04.sif using singularity version 3.8.6-1.el7 at slac.

Curiosity: Tom was not able to reproduce this locally: jobs ran fine regardless of verbosity. I have been able to reproduce it with LDCS on a number of files from -batch24 above.

tomeichlersmith commented 1 year ago

This may be related to the recent patch to Framework where an extra copy that was causing memory to be mismanaged was taking place. https://github.com/LDMX-Software/Framework/pull/59

I was not able to originally reproduce this error, but it may be time to try to reproduce this error again to see if that patch resolved the issue.