x/tools/cmd/heapdump: create a heap dump viewer

matloob commented 8 years ago

@alandonovan @randall77 @aclements

I'd like to propose a new tool: a viewer for go heap dumps.

runtime/debug.WriteHeapDump already provides a mechanism for writing heap dumps, but we don't provide a tool to inspect the heap dumps. We should provide a graphical tool for inspecting and analyzing heap dumps similar to those that exist for Java heap dumps.

The tool would fit best in the tools subrepo or the core repo.

fjl commented 8 years ago

A viewer for Go 1.4 dumps is available here: https://github.com/randall77/heapdump14 I have sucessfully used it to debug my application (compiled with Go 1.6) but some hacking was required to make it work.

The challenge for this tool is that the heapdump format doesn't contain complete type information. There is a lot of guesswork involved to figure out the types of all objects. In my case it failed to recognize most of them and I had to add an additional heuristic that tries to identify types by matching their GC signature.

I would love to see a better tool, heap dumps are invaluable when it comes to finding memory leaks.

@randall77 works on Go and can probably say more about this.

aclements commented 8 years ago

@fjl, I think the plan is to start with heapdump14 (or at least salvage as much as possible from it). Broadly, this may also involve making the heap dump format more friendly to analysis tools, so they don't have to do as much guesswork.

randall77 commented 8 years ago

Yes, the 1.3 viewer was awesome because we had full type info for every object in the heap. 1.4 lost that info and the 1.4 viewer tries (badly) to reconstruct type info as much as it can from the DWARF info for the roots. We'd want to do a better job of that if possible.

What's the plan for heap dump format? We talked a while ago about changing to use a standard core dump with some breadcrumbs to encode the heap metadata. Any progress on that?

aclements commented 8 years ago

@matloob, it would be great if the heap reader part of the tool lived in a separate package that can be used to build tools other than the viewer. There have been several times when I've wanted to do one-off heap analyses (usually for debugging the GC), but the tooling simply wasn't feasible. Such a package wouldn't necessarily have to satisfy Go 1 compatibility.

fjl commented 8 years ago

For my use cases, it would be perfectly acceptable if precise heapdump analysis required a GODEBUG flag (e.g. to enable tracking of interface types) or automated source rewrites like cmd/cover does. That would probably simplify the analysis code a lot.

matloob commented 8 years ago

@fjl Yes, the goal of this proposal is to have an up-to-date heap viewer. The end product will look quite different from the current viewer, but we'll reuse as much as it as we can.

@aclements Yes, I'll try to keep as much of the tool in separate packages that can be used to read or analyze a heap.

@randall77 Yes, I think it would be good to use a standard core dump + breadcrumbs, but I haven't looked very deeply yet.

matloob commented 8 years ago

Is it okay if I start working on a prototype in the tools subrepo? I'm thinking of putting things in to the 'cmd/heapview' subdirectory.

bradfitz commented 8 years ago

SGTM.

matloob commented 8 years ago

On second thought I think 'goheap' might be a better name for the command, to distinguish from other heap related commands on a system.

bradfitz commented 8 years ago

I don't like prefacing everything with "go".

to distinguish from other heap related commands on a system.

I don't have any other heap-related commands on my system. :-)

matloob commented 8 years ago

heapdump it is! we can always change the name later

aclements commented 8 years ago

heapview? "heapdump" says to me that it dumps the heap, not that it analyzes heap dumps. (But my opinion on this is not very strong. :)

bradfitz commented 8 years ago

heapview is also fine with me.

matloob commented 8 years ago

My opinion isn't very strong either, but one advantage of heapdump is that it can name a more general tool.

So you might use heapdump view to start the viewer or heapdump grab to grab a heapdump or heapdump stats to get some stats on a heap dump?

aclements commented 8 years ago

I would lean toward making those separate tools instead of using subcommands. In particular, if there's a library that lets people write other tools to process heap dumps, making these separate commands puts those other tools on the same footing, and users don't have to remember what is a subcommand of the "official" heapdump command and what isn't.

matloob commented 8 years ago

ok, looks like everyone's ok with heapview, so unless there are any objections, i'll start using that name?

gopherbot commented 8 years ago

CL https://golang.org/cl/25101 mentions this issue.

gopherbot commented 8 years ago

CL https://golang.org/cl/25240 mentions this issue.

gopherbot commented 8 years ago

CL https://golang.org/cl/25273 mentions this issue.

tombergan commented 8 years ago

It turns out I prototyped part of what this issue is asking for without realizing this issue existed. Thanks to @matloob for pointing me here. I also fixed a few bugs in the old heapview code. https://github.com/tombergan/goheapdump

I agree with Austin that the main artifact of this issue should be an API for analyzing heap dumps. I took a stab at such an API here. It would be great to unify the API for heap analysis with the API in x/debug. I'm actually more excited about the idea of an automated heap checker rather than just a heap viewer (although that would be cool and useful as well). I wrote a simple http.Response leak checker here. I have more thoughts in that file about heap checkers we could write.

Also, a big +1 to making the heapdump just a core file, possibly with extra breadcrumbs. I ran into a few problems with the current heapdump format. I tried to fix some of them, but couldn't easily fix all of them -- I ran into similar type-matching issues for interfaces as @fjl. My initial thought was to use a core file without breadcrumbs. This way we could potentially deprecate runtime.WriteHeapDump entirely and instead use any core file. Roughly:

Use types from DWARF to bootstrap the heapdump loader. This gives you enough info to find and walk the mspans, _types, itabs, and so on.
Walk those runtime structures to learn types for the entire heap.
Walk the GC masks to enumerate all the pointers tracked by GC.
Export all of this to the client using a nice, reflect-like API.

The downside of this approach is that the runtime structures may change from release-to-release. This could be a pain to maintain.

Long story short, a tool like this is something I've wanted when debugging OOMs in my day-to-day work. It's also a technically interesting problem and I'm happy to help out as needed.

matloob commented 8 years ago

@tombergan I haven't had a chance to look very deeply at the API in your package yet, but it looks good.

My plan was to start by working on in support libraries for reading and understanding cores files as heap dumps into golang.org/x/tools/cmd/heapview/internal as support code. Once we're more confident about it, we can move the libraries into golang.org/x/tools/heap, but working in an internal package at first allows us some flexibility in changing the API.

I think it would be good to start with your API interface (and as much of the code that's applicable to cores as heaps) golang.org/x/tools/cmd/heapview/internal and build from there. What do you think?

matloob commented 8 years ago

I've put up a proposal doc for this issue: https://github.com/golang/proposal/blob/master/design/16410-heap-viewer.md

rhysh commented 8 years ago

@matloob Will the ELF core file described in the proposal doc be the same (on ELF-based systems) as one generated by GOTRACEBACK=crash or gcore(1)? Is the ELF format all they'll have in common, or will they contain the same information and layout, such that the tools you're producing would be able to work on core files from any of those three sources?

Ideally, there will be a 'one-click' solution to get from running program to dump. One possible way to do this would be to add a library to expose a special HTTP handler. Requesting the page would that would trigger a core dump to a user-specified location on disk while the program's running, and start the heap dump viewer program.

What information is currently missing from Linux core dumps of Go programs that would be necessary to reconstruct the heap? What is required to include that information in GOTRACEBACK=crash core dumps?

randall77 commented 8 years ago

I've taken a look through the proposal doc and I like it.

@tombergan , I looked through your API. Here's a few comments:

We absolutely want to have the dump analyzer/viewer to be able to handle large heaps. At the same time, I think designing the analyzer to somehow process the dump with O(1) space is a hard research problem. I don't want us to tackle that in v1. For now, we can probably get away with using O(1) space per object (independent of object size) as the current viewer attempts to do. I expect most large heaps are large because of large objects ([]byte probably), so using O(1) space per object will reduce space usage significantly.
The API surface is really large. I'm not sure there is anything really to be done here, except to prune unnecessary or redundant stuff whenever we can.

What information is currently missing from Linux core dumps of Go programs that would be necessary > to reconstruct the heap? What is required to include that information in GOTRACEBACK=crash core dumps?

This one should be our top priority. We'll want 1.8 to have all the fixes we need to get reliable types/breadcrumbs/dwarfinfo/etc. in the dumps (core files?). It would be a bummer to discover after the 1.8 freeze (Nov 1) that there is one bit of information we realize we needed but didn't have.

tombergan commented 8 years ago

Proposal LGTM! Some comments:

The advantage of the hprof format is that there already exist many tools for analyzing hprof dumps. It will be a good idea to consider this format more throughly before making a decision.

I vote against using hprof as the main format because it doesn't support interior pointers well. Everything is based on object ID. A lot of the interesting analyses we might want to do will need to understand interior pointers (example).

@rhysh Will the ELF core file described in the proposal doc be the same (on ELF-based systems) as one generated by GOTRACEBACK=crash or gcore(1)? What information is currently missing from Linux core dumps of Go programs that would be necessary to reconstruct the heap?

Good questions that the proposal should answer. I believe there are three things we want to extract from the core dump, besides the usual DWARF data: (1) Dynamic types of interface values, (2) location of goroutine stacks, and (3) Where the GC thinks the pointers are. In theory, all of these can be extracted from an ordinary core file with DWARF data, but to do that, you need to walk the internal runtime structures. This means the heapdump library will need to change each time the runtime structs change, which could be annoying. The alternative idea is to embed (1,2,3) in a custom section of the ELF file using a stable format. This means less work for the heapdump library, however, it also means the heapdump library won't be able to process ordinary core files as generated by gcore.

I have a slight preference for supporting ordinary core files, but am curious what others think. Note that the x/debug library already has code to walk some of the runtime structures. There's probably an opportunity to share code with that library.

@matloob I think it would be good to start with your API interface (and as much of the code that's applicable to cores as heaps) golang.org/x/tools/cmd/heapview/internal and build from there. What do you think?

SGTM. As @randall77 points out, that API I prototyped could use some pruning and cleaning. Don't be afraid to take a hatchet to it. Feel free to CC me on any CLs or assign me bits of work as they come up.

@randall77 We absolutely want to have the dump analyzer/viewer to be able to handle large heaps. At the same time, I think designing the analyzer to somehow process the dump with O(1) space is a hard research problem. I don't want us to tackle that in v1.

Agree with that completely, except I might replace "hard" with "fun and distracting" :-) The offline email thread with @alandonovan was more about the API than the implementation (not painting ourselves into a corner where the API becomes impossible to implement for large heaps).

aclements commented 8 years ago

The alternative idea is to embed (1,2,3) in a custom section of the ELF file using a stable format. This means less work for the heapdump library, however, it also means the heapdump library won't be able to process ordinary core files as generated by gcore.

The plan was to not to have a special ELF section, but for the runtime to construct a data structure at a known symbol with the high-level information the heap dumper needs in a form that's convenient for the heap dumper. This would be easy to read out of the core file and wouldn't require any special core dump support. We definitely want ordinary system core dumps to work, both because that's easier for bootstrapping and because that's the only way you're going to get a core dump on OOM.

y3llowcake commented 7 years ago

Are supporting changes to the runtime in progress? Is there somewhere I can follow along or offer assistance?

As someone who runs a memory intensive production service, I can't express my desire for this enough.

tombergan commented 7 years ago

There are no supporting changes to the runtime in progress AFAIK. I've been working on a corefile-based heap debugger (https://github.com/tombergan/goheapdump) but it's still a ways from being usable and this is not my primary project, so progress is slow.

It's in theory possible to implement a corefile-based heap debugger without any runtime library changes, but the downside is you have to reimplement all of the logic to grok GC bitmaps in the corefile tool. This is basically what I'm doing. @aclements may have something more clever in mind, I'm not sure. The only deficiency I've noticed so far is incomplete DWARF info. For example, AFAICT, there is no DWARF info for free variables in closures -- I believe these are stored in *funcvals, although I'm not totally clear on the internal representation. Another example is that internal structures like runtime.arraytype are often missing from the DWARF, which makes walking itabs kind of annoying.

Dieterbe commented 6 years ago

regarding the API, I think i have another nice use case for this. if we're able to walk all the pointers through the heap in the same way the GC does, then we have a nice way to represent GC workload and generate reports about which kinds of datastructures account for most of the work, e.g. you can find out where you can get most bang of the buck wrt optimizing pointer-based structures into pointerless ones to shorten GC times. similar to pprof profiles where we assign weights to lines of code, we would here assign weights to memory locations and types. (not sure how feasible this is and whether it makes sense?)

randall77 commented 6 years ago

@Dieterbe: Sure, number of pointers is a reasonable proxy for GC load. Objects, and subgraphs of objects, can be weighted by the number of pointers they contain.

See https://github.com/golang/go/issues/21356 , I'm working on a library for reading core files. That library is intended for use by such tools. I'm planning on putting a reference example of such tool in x/debug/cmd/coreview. But the intent is that others will be able to use the library to make more awesome tools than I can.

matloob commented 5 years ago

I'm not planning to, and haven't been able to work on this. I'm going to close this bug.

golang / go

x/tools/cmd/heapdump: create a heap dump viewer #16410