Open green-cabbage opened 7 months ago
Can you please try this without using a dask distributed client and in single threaded mode. You will have to watch top to understand if there is a problem.
dask.compute(stuff, scheduler="sync")
and do not make a distributed.Client()
in a with block or otherwise.
I assume the 1 GB memory limit above is only to make it fail sooner?
Yes, 1 GiB memory limit is only to make it fail sooner. I suppose you would like me to use memray as well. I have never used that before, so it may take a while since I have to scan the documentations. Could you also be more specific when you say "watch top to understand if there is a problem". Like, what should I be looking for at top? The increasing memory usage?
Increasing RSS over time, @yimuchen may have some helpful scripts here.
One tool that you can use to test locally to see if this is dask
issue or a coffea
issue would be python's tracemalloc [1]. This can help narrow down which function is eating up memory. And example would be something like (might need a bit of tweaking as this was helping with debugging an issue with coffea==0.7
):
from coffea.nanoevents import NanoEventsFactory
snapshot = tracemalloc.take_snapshot()
for filename in [f'/srv/Ntuples/{i}_RA2AnalysisTree.root' for i in range(20)]:
step_size = 500 # Force slicing the file into multiple chunks to see if this memory leaks appear per chunk
for start in range(0, 30000, step_size):
events = NanoEventsFactory.from_root(filename,
treepath='TreeMaker2/PreSelection',
schemaclass=BaseSchema,
metadata={
'dataset': 'test'
},
entry_start=start,
entry_stop=start + step_size).events()
run_my_analysis(events)
s2 = tracemalloc.take_snapshot()
for s in s2.statistics('lineno', cumulative=True)[:10]:
print(s)
What you would want to look out for is if there is a singular function that monotonously increases in each subsequent s2.statistics
print-outs. If it points to some function in coffea/awkward, that might be where a real memory leaking is happening and helps narrow down the investigation range.
There is also memory-profiler [1], but this only shows if there are leaks happening, not so much where the leak is appearing. But it makes plots which might be easier on the eyes then trying to guess if there are leaks based on text outputs (remember to use the --multiprocess
or --include-children
flag if you are running local multithread dask)
I'm trying your demo repository (https://github.com/green-cabbage/dask_mem_leak_test) now, with
coffea verion: 2024.4.0
distributed verion: 2024.4.1
uproot verion: 5.3.2
(from coffeateam/coffea-dask-almalinux8:2024.4.0-py3.11
)
and not seeing much memory growth. After the first chunk is done it seems to stay steady at around 450MB.
Though, I do see a lot of
2024-04-12 13:01:13,086 - distributed.utils_perf - WARNING - full garbage collections took 17% CPU time recently (threshold: 10%)
I tried with and without
uproot_options={"handler" : uproot.XRootDSource}
with no real difference.
After 11 files, I get a read error:
OSError: File did not open properly: [ERROR] Server responded with an error: [3000] Unable to open - cannot determine the prefix path to use for the given filesystem id /store/data/Run2018A/SingleMuon/NANOAOD/UL2018_MiniAODv2_NanoAODv9-v2/2550000/22893D09-9B7B-694B-968E-DC8566E2272B.root; invalid argument
Sorry for the late reply. The OSError IMO is primarily due to purdue XRootD not being the most reliable. I have updated the input_file.json
and input_file_Big.json
in the latest commit here, which should be more reliable as I have replaced purdue's XRootD path with fermilab's.
Moreover, observing lots of
2024-04-12 13:01:13,086 - distributed.utils_perf - WARNING - full garbage collections took 17% CPU time recently (threshold: 10%)
is expected, but if you let your code run longer (in my test, it was around processing 13th root file out of 92), then you would start to read statements along the lines of:
024-04-14 19:28:40,392 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 500.21 MiB -- Worker memory limit: 716.80 MiB
if you keep running the script, the unmanaged memory slowly creeps up until the client automatically goes through phases of pausing and unpausing until it completely freezes due to high unmanaged memory.
On a seperate note, I have ran the local client test with and without uproot_options={"handler" : uproot.XRootDSource}
using local paths (NOT XRootD path) and I still observe the unmanaged memory statement. So I believe this issue is not related to XRootD.
Edit: I have made a dask report of the running through memleakage_demo.ipynb the with local root file paths using dask local client here, and you can see in the task stream tab that the gap between the "pt" workload increases dramatically due to significant compute time being dedicated to managing leaking memory from the client.
As a small update, it does seem like it is a XRootD problem. I re-ran the reproducer with local root files path, but with memory increased from 0.7 GiB to 1 GiB, and while I had lots warnings like distributed.utils_perf - WARNING - full garbage collections took 26% CPU time recently (threshold: 10%)
, the worker never had to pause due to memory leakage.
On a different note, I am not so sure 26% CPU time being used for garbage collection is normal. Is this something I need to look at it separately to optimize my workflow?
I think you've found instead the "steady state" memory usage.
Lots of GC time means you're making a bunch of temporaries. But that's easier to diagnose with the memory tools you have at hand.
It might be worth it to give the latest coffea and dask-awkward/dask-histogram a try if you're not already.
This is a quick report of an issue I have noticed with newer version of coffea (2024.3.0) with updates from interactions I had with @lgray . Reproducer of this issue is pushed onto my public repo: https://github.com/green-cabbage/dask_mem_leak_test. Please let me know if I am unclear some certain parts, as I am not experienced in making bug reports.
The basic gist is that loading root files using XRootD links and then doing simple processing leads to dask client's memory slowly increasing until memory overflow unless regularly restarted. This happens for both local cluster as well as external cluster (screen shot of residual memory after no more jobs is added below). The leakage is observed for both
NanoEventsFactory.from_root()
anduproot.dask()
functions. However, memory leakage is not observed when explicitly usingXRootDSource
handler for uproot, as in:uproot.dask(, handler=uproot.XRootDSource)
. However, implementingXRootDSource
handler forNanoEventsFactory.from_root()
(ieNanoEventsFactory.from_root( ,uproot_options={"handler" : uproot.XRootDSource})
) still leads to memory overload.Moreover, using local filenames instead of XRootD links on
uproot.dask()
doesn't generate the leakage. The same withNanoEventsFactory.from_root()
has not been tried.