bloomberg / memray

Memray is a memory profiler for Python
https://bloomberg.github.io/memray/
Apache License 2.0
13.17k stars 392 forks source link

empty flamegraph/summary with large memray dump #555

Closed masterkidan closed 7 months ago

masterkidan commented 7 months ago

Is there an existing issue for this?

Current Behavior

Hello folks, I am using the memray api to automatically capture some traces around celery tasks. The idea is I start the capture when the before the task is run and stop it after the task is finished. The dump file is fairly large (around 300 MB). After capturing it, when I try to render the flamegraph/table, the flamegraph/table shows up as empty, despite being 40 MB in size. The same is true for the memray summary command as well.

Interestingly, the memray stats/memray tree commands work as expected though and generate a reasonable output, describing what the largest allocators are etc.. If I were to use memray attach and dump it to a file on the same machine, I am able to get a flamegraph generated.

Expected Behavior

flamegraph/table should not be empty

Steps To Reproduce

It is hard to provide repro steps here. What I can share is that the heap gets as l large as 40 GB.

Memray Version

1.10.0

Python Version

3.9

Operating System

Linux

Anything else?

No response

pablogsal commented 7 months ago

Hi @masterkidan

Please understand that without a reproducer we cannot help you here. Can you at least send us the dump file so we can take a look?

pablogsal commented 7 months ago

. If I were to use memray attach and dump it to a file on the same machine, I am able to get a flamegraph generated.

What do you mean "on the same machine"? Does this mean you are generating in one machine and analysing in another?

masterkidan commented 7 months ago

Just wanted to clarify that I was analyzing it on the same machine as where I had generated the report. I am happy to share the dump file privately. Is there some place where I can upload it?

masterkidan commented 7 months ago

github doesn't allow me to upload files > 25 MB in size

pablogsal commented 7 months ago

github doesn't allow me to upload files > 25 MB in size

You can upload it elsewhere and paste a link here

pablogsal commented 7 months ago

Otherwise you can send it to me at pablogsal [AT] gmail [DOT] com

masterkidan commented 7 months ago

K, sent a one drive link to the above email, lmk if you need anything else. Thanks for the help, and for this amazing tool!

pablogsal commented 7 months ago

Can you confirm the memray version you used? I am getting

Reason: The provided input file is incompatible with this version of memray.: unspecified iostream_category error

When I use memray 1.10.0

masterkidan commented 7 months ago

That's odd. It does say memray 1.10.0 when I run memray --version

godlygeek commented 7 months ago

It seems like something has replaced every 0x0A byte in that capture file with a 0x0D 0x0A pair, as though unix2dos was run on it, or something like that... That's why the version number doesn't match. The capture file is supposed to contain a 4-byte little endian integer with a value of 10 for the capture format version (0x0A000000), and it instead contains a 4-byte little endian integer with a value of 2573 (0x0D0A0000).

All I can imagine is that something treated it as text instead of binary and replaced every LF with CRLF.

godlygeek commented 7 months ago

It does seem to parse successfully if I drop every 0x0D that is followed by a 0x0A from it... 👀

masterkidan commented 7 months ago

I had to copy from the Linux server to my windows box, maybe that runs it through unix2dos somewhere … though it’s weird.

godlygeek commented 7 months ago

Well, here's what I can tell you: the flamegraph you sent (flamegraph4.html) is truncated:

$ tail -c 100 flamegraph4.html
,[1708646587137,12357906432,12096164517],[1708646587147,12357906432,12096164517],[1708646587157,1235

That ought to end with a </html>. So, that's why that isn't working.

Same problem on the table_dump.html that you sent:

$ tail -c 100 table_dump.html
800,12095838352],[1708640199744,12356300800,12095838352],[1708640199754,12356300800,12095838352],[17

Both of those are truncated in the middle of writing some JSON data into the HTML document. That's not something Memray did wrong - that's not something we even could do wrong.

Moreover, if I drop every CR that is followed by an LF from the dump.memtrace that you sent, then memray flamegraph does successfully generate a flame graph HTML which loads successfully for me.

So: my educated guess here is that you exceeded your disk quota on whatever system you were running this on, and so only part of the HTML file contents got written.

Can you check if it might be something like that?

masterkidan commented 7 months ago

So the above dump was from a kubernetes linux container running in azure, unfortunately the container is gone now, so I don't think I'll be able to confirm if disk was full. I do remember being able to write the table_dump and other files to disk, so it may be unlikely. Maybe there was some error in flushing the html to disk? I'll keep an eye out if this recurs and try to come up with better steps for repro. Feel free to close this issue, and thanks for the help!. If you can share the fixed dump.trace file back that would be super helpful!

godlygeek commented 7 months ago

Feel free to close this issue, and thanks for the help!.

Sure - if you're able to reproduce the issue and can confirm it isn't a problem with your disk quota being exceeded, feel free to reach back out.

If you can share the fixed dump.trace file back that would be super helpful!

Here's the program I ran to fix it:

import itertools

with open("dump.memtrace", "rb") as f:
    data = f.read()
    data += b"x"

with open("dump.memtrace.fixed", "wb") as f:
    for val, next in itertools.pairwise(data):
        if val != 0x0D or next != 0x0A:
            f.write(bytes([val]))

I ran that with Python 3.12, the input was 314639950 bytes, the output was 314193920 bytes. It took about a minute to run - I just went for something dumb and easy, heh.

From there a simple memray flamegraph dump.memtrace.fixed generated a loadable flame graph.