CoffeaTeam / coffea

Basic tools and wrappers for enabling not-too-alien syntax when running columnar Collider HEP analysis.
https://coffea-hep.readthedocs.io
BSD 3-Clause "New" or "Revised" License
134 stars 127 forks source link

Observed memory leakage in NanoeventsFactory.from_root() #1072

Open green-cabbage opened 7 months ago

green-cabbage commented 7 months ago

This is a quick report of an issue I have noticed with newer version of coffea (2024.3.0) with updates from interactions I had with @lgray . Reproducer of this issue is pushed onto my public repo: https://github.com/green-cabbage/dask_mem_leak_test. Please let me know if I am unclear some certain parts, as I am not experienced in making bug reports.

The basic gist is that loading root files using XRootD links and then doing simple processing leads to dask client's memory slowly increasing until memory overflow unless regularly restarted. This happens for both local cluster as well as external cluster (screen shot of residual memory after no more jobs is added below). The leakage is observed for both NanoEventsFactory.from_root() and uproot.dask() functions. However, memory leakage is not observed when explicitly using XRootDSource handler for uproot, as in: uproot.dask(, handler=uproot.XRootDSource). However, implementing XRootDSource handler for NanoEventsFactory.from_root() (ie NanoEventsFactory.from_root( ,uproot_options={"handler" : uproot.XRootDSource})) still leads to memory overload.

Moreover, using local filenames instead of XRootD links on uproot.dask() doesn't generate the leakage. The same with NanoEventsFactory.from_root() has not been tried.

memLeak_afterInterrupt png

lgray commented 7 months ago

Can you please try this without using a dask distributed client and in single threaded mode. You will have to watch top to understand if there is a problem.

dask.compute(stuff, scheduler="sync") and do not make a distributed.Client() in a with block or otherwise.

I assume the 1 GB memory limit above is only to make it fail sooner?

green-cabbage commented 7 months ago

Yes, 1 GiB memory limit is only to make it fail sooner. I suppose you would like me to use memray as well. I have never used that before, so it may take a while since I have to scan the documentations. Could you also be more specific when you say "watch top to understand if there is a problem". Like, what should I be looking for at top? The increasing memory usage?

lgray commented 7 months ago

Increasing RSS over time, @yimuchen may have some helpful scripts here.

yimuchen commented 7 months ago

One tool that you can use to test locally to see if this is dask issue or a coffea issue would be python's tracemalloc [1]. This can help narrow down which function is eating up memory. And example would be something like (might need a bit of tweaking as this was helping with debugging an issue with coffea==0.7):

from coffea.nanoevents import NanoEventsFactory
snapshot = tracemalloc.take_snapshot()
for filename in [f'/srv/Ntuples/{i}_RA2AnalysisTree.root' for i in range(20)]:
  step_size = 500 # Force slicing the file into multiple chunks to see if this memory leaks appear per chunk
  for start in range(0, 30000, step_size):
    events = NanoEventsFactory.from_root(filename,
                                         treepath='TreeMaker2/PreSelection',
                                         schemaclass=BaseSchema,
                                         metadata={
                                             'dataset': 'test'
                                         },
                                         entry_start=start,
                                         entry_stop=start + step_size).events()
    run_my_analysis(events)
  s2 = tracemalloc.take_snapshot()

  for s in s2.statistics('lineno', cumulative=True)[:10]:
    print(s)

What you would want to look out for is if there is a singular function that monotonously increases in each subsequent s2.statistics print-outs. If it points to some function in coffea/awkward, that might be where a real memory leaking is happening and helps narrow down the investigation range.

[1] https://docs.python.org/3/library/tracemalloc.html

yimuchen commented 7 months ago

There is also memory-profiler [1], but this only shows if there are leaks happening, not so much where the leak is appearing. But it makes plots which might be easier on the eyes then trying to guess if there are leaks based on text outputs (remember to use the --multiprocess or --include-children flag if you are running local multithread dask)

[1] https://pypi.org/project/memory-profiler/

nsmith- commented 7 months ago

I'm trying your demo repository (https://github.com/green-cabbage/dask_mem_leak_test) now, with

coffea verion: 2024.4.0
distributed verion: 2024.4.1
uproot verion: 5.3.2

(from coffeateam/coffea-dask-almalinux8:2024.4.0-py3.11) and not seeing much memory growth. After the first chunk is done it seems to stay steady at around 450MB.

Screenshot 2024-04-12 at 1 06 39 PM

Though, I do see a lot of

2024-04-12 13:01:13,086 - distributed.utils_perf - WARNING - full garbage collections took 17% CPU time recently (threshold: 10%)

I tried with and without

uproot_options={"handler" : uproot.XRootDSource}

with no real difference.

After 11 files, I get a read error:

OSError: File did not open properly: [ERROR] Server responded with an error: [3000] Unable to open - cannot determine the prefix path to use for the given filesystem id /store/data/Run2018A/SingleMuon/NANOAOD/UL2018_MiniAODv2_NanoAODv9-v2/2550000/22893D09-9B7B-694B-968E-DC8566E2272B.root; invalid argument
green-cabbage commented 7 months ago

Sorry for the late reply. The OSError IMO is primarily due to purdue XRootD not being the most reliable. I have updated the input_file.json and input_file_Big.json in the latest commit here, which should be more reliable as I have replaced purdue's XRootD path with fermilab's.

Moreover, observing lots of 2024-04-12 13:01:13,086 - distributed.utils_perf - WARNING - full garbage collections took 17% CPU time recently (threshold: 10%) is expected, but if you let your code run longer (in my test, it was around processing 13th root file out of 92), then you would start to read statements along the lines of: 024-04-14 19:28:40,392 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 500.21 MiB -- Worker memory limit: 716.80 MiB

if you keep running the script, the unmanaged memory slowly creeps up until the client automatically goes through phases of pausing and unpausing until it completely freezes due to high unmanaged memory.

On a seperate note, I have ran the local client test with and without uproot_options={"handler" : uproot.XRootDSource} using local paths (NOT XRootD path) and I still observe the unmanaged memory statement. So I believe this issue is not related to XRootD.

Edit: I have made a dask report of the running through memleakage_demo.ipynb the with local root file paths using dask local client here, and you can see in the task stream tab that the gap between the "pt" workload increases dramatically due to significant compute time being dedicated to managing leaking memory from the client.

green-cabbage commented 7 months ago

As a small update, it does seem like it is a XRootD problem. I re-ran the reproducer with local root files path, but with memory increased from 0.7 GiB to 1 GiB, and while I had lots warnings like distributed.utils_perf - WARNING - full garbage collections took 26% CPU time recently (threshold: 10%) , the worker never had to pause due to memory leakage.

On a different note, I am not so sure 26% CPU time being used for garbage collection is normal. Is this something I need to look at it separately to optimize my workflow?

lgray commented 7 months ago

I think you've found instead the "steady state" memory usage.

Lots of GC time means you're making a bunch of temporaries. But that's easier to diagnose with the memory tools you have at hand.

It might be worth it to give the latest coffea and dask-awkward/dask-histogram a try if you're not already.