Closed wuxian08 closed 1 year ago
Our flamegraph, by default, shows a snapshot of the point in time at which the largest amount of heap memory was allocated and not yet freed. It shows the memory that was allocated by that high water mark and not yet freed, aggregated by call stack. We can accomplish this because we can accurately track whenever any heap memory is allocated or freed (by injecting hooks that override allocation and deallocation functions in the procedure linkage table - see #225).
I can't see any way to do something analogous for RSS instead of heap size. I'm not aware of any way for us to install hooks that would allow us to be notified when pages are swapped out to disk or back in by the kernel. Without being able to monitor memory being paged in or out, there would be no way for us to determine how many bytes of each heap allocation are actually resident in memory at any given time, and without knowing that we can't know when the resident set size reaches its high water mark, or how many bytes were allocated by each unique stack at that point in time.
The closest thing I'm aware of that exists is the mincore
API. Instead of being pushed information about swaps as they occur, we could use mincore
to periodically poll for whether changes have happened. There's a lot of problems with that idea, though. For one, there's a fundamental time-of-check/time-of-use issue associated with mincore
: by the time an answer is returned to us, it might already be wrong (because the kernel might immediately and asynchronously perform some paging due to memory pressure after returning the results to us). Even if it were reliable, it wouldn't let us efficiently find a delta since the last time we polled. We'd need to loop over every page in the process's address space (or at least, all the ones that contain heap-allocated objects) and check whether they're currently in RAM or paged out to compute a delta ourselves. That's O(n)
with the number of allocations, which means it would be far slower than anything that Memray does today (as of today, all of the operations we perform when a new allocation or deallocation occurs are O(1)
with the number of allocations that have previously been performed). Also, because it's polling for a snapshot of the RSS pages periodically, there's a chance that we'd just totally miss the high water mark. If a bunch of data is paged in and then quickly paged out between two of our poll intervals, we'd totally miss that it had ever been paged in, and the report we'd generate that supposedly tells the user what their peak resident set size was would be wrong.
So, I'm not seeing any reasonable way to get you a report like the one you're describing. Are you aware of any other memory profilers that are able to give a report of allocations by call stack that doesn't count data that is swapped out or not backed by pages?
I'm going to close this. As things stand today, I don't see any reasonable way to achieve the sort of report you're hoping for. If anyone sees a way that this could be done, let me know and we can discuss whether it would be worthwhile for Memray to pursue it.
Is there an existing proposal for this?
Is your feature request related to a problem?
I have profiled a python program using memray native mode, which calls C++ functions. In this example, to be illustrative, I once malloced a 5.0GB memory space and never touched it afterwards. Therefore, this corresponds to some virtual memory address and not resident size, also dipicted in #246 #273
As you can see in the detailed flamegraph, it clearly shows the 5.0GB memory allocation. So my question is, is it possible for the flamegraph to show the actual usage size of the program? thanks.
Describe the solution you'd like
I would like if the detailed flamegraph produced by memray native mode could allow users to choose whether to show h_vmem or s_rss in detail.