bloomberg / memray

Memray is a memory profiler for Python
https://bloomberg.github.io/memray/
Apache License 2.0
13.13k stars 390 forks source link

Add support for custom timestamped snapshots of memory #678

Open jgbradley1 opened 1 week ago

jgbradley1 commented 1 week ago

Is there an existing proposal for this?

Is your feature request related to a problem?

I have a long-running process defined as an asynchronous pipeline of multiple steps. You may think of it as an ETL pipeline. These steps can consist of activity such as modifying pandas dataframes for example).

Not all steps have memory issues but I would like to use memray to understand how memory is used and passed around in my pipeline over time. There is a memory explosion that occurs in one of the final steps of the pipeline.

Since the current flamegraph report only displays a snapshot of memory use at peak memory time, it is hard for me to investigate what/where in prior steps of the pipeline might be a contributing factor to the explosion. While analyzing the point of peak memory explosion is very insightful, it does not capture enough information about other parts of my pipeline that could be optimized compared to areas that cannot (known reasons for the explosion like a join call between pandas dataframes is always memory expensive). Instead, focusing on reducing memory in prior steps would reduce the impact of the memory explosion in later steps and still lead to an overall reduction in memory.

Describe the solution you'd like

Is there a way to plant some sort of marker in my code (i.e. as a function decorator or API call) that signals to memray to take a snapshot of the memory usage?

Ideally I'd like to look at a plot of heap memory usage over time and see markers where my code called into memray to record a place in time and code. This functionality would enable users to make custom calls to memray in their code (my pipeline code for example) to generate timestamped snapshots of memory and compare them over time (not just at peak memory usage).

Alternatives you considered

No response

godlygeek commented 1 week ago

Have you seen https://bloomberg.github.io/memray/flamegraph.html#temporal-flame-graphs ? The --temporal mode lets you compare the memory usage between two moments in your program's execution.