Open benkrikler opened 10 years ago
I've spent a long long time now digging around in valgrind output trying to get to the bottom of things.
I suspect that I'm running on a system with low memory available, indeed ulimit suggests 3GB. If I decrease the number of bins in PlotPedestalAndNoise then I'm able to analyse the entire run.
For what it's worth, the largest leak I could spot running with just this module and coming from our own code (as opposed to ROOTs) was 130 KB. It really does seem then that what we're doing is just very memory intensive.
It would be interesting to profile each module and check it all properly but I'll leave that for another day and close this for now.
If you suspect it's a resource shortage, but aren't sure try running starting at event 100 or something. The fail point should move forward by about 100 events. (also try starting after the problematic event)
I tried that a previous time and saw it have no impact, so concluded then that it was a resource issue as well. I haven't tried it with this though, so if it bites me again I'll take a look.
This module crashed again when running on the batch system for production (issue #166). The problem looks like the same as this one, where a bad_alloc is thrown with no other warning. I'm not sure how the batch system handles memory and if many processes are running simultaneously but if they share the memory then running fewer jobs simultaneously could help.
I'm trying to run over an entire run and after event 119, I get a bad_alloc exception. My modules file is:
so the only possible source of this is the PlotPedestalAndNoise module.
If I put in the suggestions from #175, I confirm this and it's clear it's coming from entry 119.
The backtrace I get if I run in gdb is:
so I think the actual culprit is inlined somewhere, but I'm not sure where. There are only a few occurrences of
new
from what I can see and I don't think they're being called before this crashes.