Open kawaho opened 4 months ago
Oh that's not a serious warning you may ignore it. I've not seen any serious slowdown with graphs that are 100s of MB in size.
The large memory usage you're observing is coming from another source.
About the large memory, the number of workers that have particularly high memory usage causing the failure is exactly the number of correctionlib objects I am opening, which is 9 in this case. What I understood is that these workers are the ones dealing with the SFs/smearing stuff. Naively, I would expect if I launched x workers, there will be at least 9x of such tasks but the task number always remain 9 (even with like 100 workers). I wonder if it can be handled better with some tricks because a lot of memory is wasted.
@kawaho sorry for the long reply time.
This might be something we can fix by inlining the correctionlib objects in the graph so that it appears on each node that need to access correctionlib. We'd have to play with it a bit.
Right now it opens the correctionlib CorrectionSet 9 times for the whole workflow and transports the needed parts to the workers that need a specific correction within the file. I'll see if this can be done a little bit more leanly (at the cost of speed, probably).
I think you can also get some mileage of out not opening the correction sets each time you call process, but rather ones when you make the processor.
I am trying to use correctionlib within coffea+dask to apply a set of corrections. Specifically, I am applying jet smearing, pileup corrections, muon id/iso/trigger, electron reco/id scale factors (pretty standard workflow). The following code is a minimal example:
However, dask gives warning about large graph size
and more importantly, on the dask dashboard, the workers show unreasonable memory usage (labelled as umanaged (old) memory in dask), for example with step_size=20_000,
Eventually, the jobs would fail because of the high memory usage.
If one chooses to run only the smearing or the scale factors, the problem disappears but the large graph size warning persists when running jet smearing alone.
coffea version is '2024.5.0' correctionlib version is '2.5.0'