XENON1T / pax

The XENON1T raw data processor [deprecated]
BSD 3-Clause "New" or "Revised" License
16 stars 17 forks source link

MemoryError on OSG when initializing the Simulator class #727

Closed ershockley closed 5 years ago

ershockley commented 5 years ago

During OSG processing we are seeing lots of MemoryErrors that occur when initializing the Simulator class. See here for an example traceback.

We can resolve this by requesting more memory when submitting the jobs but this means we get fewer slots on the grid and so can slow down processing. Would one of the following options be possible? @mcfatelin @jhowl01 @JelleAalbers @zhut19 I know you are some of fax experts so let me know what you think.

I'm willing to do the work but would like to get feedback from experts first if possible. Thanks!

feigaodm commented 5 years ago

@ershockley I guess we can skip TopPatternFunctionFit if it helps. We anyway only use TopPatterFit in the analysis.

ershockley commented 5 years ago

At least from what I can see now the error only occurs in the Simulator class, which loads a different file than TopPatternFunctionFit. Not sure if we would see the same thing in TopPatternFunctionFit though if we remove Simulator initialization.

ershockley commented 5 years ago

I guess my main question is do we need to initialize the waveform simulator for normal data processing?

feigaodm commented 5 years ago

A few parameters and maps are defined in waveform simulator class so I guess we will need them. But the FAX experts can help to provide better guidance how to solve the problem. My suggestion was to reduce usage of TPF by a factor of 2 since you mention "PatternFit" might introduce problem.

zhut19 commented 5 years ago

As far as I can tell, the self.simulator of a core.Processor instance is not used in any of the plugins during processing (_base.ini, XENON1T.ini). If there's no impact of processing, we can move WaveformSimulator here and here to Simulation.ini

mcfatelin commented 5 years ago

Maybe it is easier to have a flag for skipping the initiation of waveform simulator, and put it true only in Simulation config. Some of the parameters defined in the config of WaveformSimulator may still be useful.

JelleAalbers commented 5 years ago

The waveform simulator is initialized because it loads/groks the various correction map files. The correction plugins then 'steal' this info from the waveform simulator, see e.g. https://github.com/XENON1T/pax/blob/master/pax/plugins/peak_processing/PeakAreaCorrections.py#L15 and https://github.com/XENON1T/pax/blob/master/pax/plugins/posrec/TopPatternFit.py#L132. It's a bit messy but does ensure all resources get loaded only once.

Thus, if you disable the simulator you'd have to disable most of the high-level plugins too. Maybe that's fine since much of these have moved to hax. But a less drastic solution would be to set the S2 pattern map to None in the config. You'd only have to disable TopPatternFit in this case.

I'm surprised it crashes on loading the S2 per-PMT map though. This is the same for all nearby runs, so why would it crash for some runs and not others? Perhaps it is crashing randomly, e.g. based on which node we get allocated? If there is not even enough memory on a node to initialize pax, I doubt there'd be enough to actually start processing events (especially if a big event comes along).

ershockley commented 5 years ago

Thanks for the comments all. I was also confused why it was only happening to some runs so I looked a bit more into it. It seems to be only happening on two specific OSG sites so I don't think it's a pax issue anymore but instead maybe a problem in the memory allocation of those sites. I'm going to close the issue.