dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.2k stars 1.57k forks source link

Ideas about possible extensions to heap snapshot format #50546

Open mkustermann opened 1 year ago

mkustermann commented 1 year ago

One can view the heap snapshot the VM produces as a snapshot of the application at a specific point in time. There's benefits to analyzing such a heapsnapshot outside VM over doing many RPCs to vm-service (which also offers some functionality - e.g. retaining path, inspecting objects, ...).

In some sense it would be nice to have similar capabilities in the tools analyzing heap snapshot as we have with a life debugger (except for actually running code - i.e. no expression evaluation).

Right now the heap snapshot format is quite minimal and pure in what it contains. That's a nice thing, but there's a few things we could do to make it more amenable to analysis and getting it closer to what one can do with live application and debugger / observatory / ...:

/cc @rmacnak-google

mkustermann commented 1 year ago

/cc @polina-c

polina-c commented 1 year ago

Thank you, Martin, for initiating this.

Some thoughts:

  1. How about just flagging VM-internal objects instead of hiding them? They still take memory so it make sense to have them visible. Should not it be analyser's task to decide what to show, unless it saves performance?

  2. Other missing information:

mkustermann commented 1 year ago

How about just flagging VM-internal objects instead of hiding them?

Sure, that would work as well.

Though various individual algorithms would then need to keep this in mind and possibly ignore edges (e.g. dominators calculation, retaining paths, successors/predecessors, ...)

For object x referencing object y, it is missing which field(s) of x reference y

This information has been there for a long time. The field name of object.references[i] is graph.classes[object.classId].fields[i].name. The exception is mainly variable-sized objects such as arrays, which will only have field information for the array header. (There were some bugs in this information, but they were fixed, e.g. recently in d68ca2cc57302c64d535993bfc0e4cad4c6e51dc, 3669086a40814ba0cbc92436bd6c39dc4bf7b357)

a. Is there difference between references and successors or it is just different format of the same information?

That's not a question of the heap format (which this issue is about), but rather the API that package:vm_service's HeapSnapshotGraph exposes. The main difference I believe is that one of them is a compact Uint32List view while the other is a rather inefficient sync* function yielding actual HeapSnapshotObject.

polina-c commented 1 year ago

For object x referencing object y, it is missing which field(s) of x reference y

This information has been there for a long time. The field name of object.references[i] is graph.classes[object.classId].fields[i].name.

If two objects are of the same class, do they have the same classId? If yes, it will mean that all instances of the same class will have the same reference field for the same index, e.g. all instances of class X will have the same field name for reference #1. Is it how it structured?

mkustermann commented 1 year ago

If two objects are of the same class, do they have the same classId?

Yes, all classes are assigned a number which we call class-id. An object is an instance of a class. The objects don't point to their class, but they store the id of the class they are an instance of.

If yes, it will mean that all instances of the same class will have the same reference field for the same index, e.g. all instances of class X will have the same field name for reference https://github.com/dart-lang/sdk/issues/1. Is it how it structured?

I'm not quite sure where the misunderstanding is. Let me try to rephrase it:

Each class has a list of fields. The fields are densely 0-indexed. The index of the field says where in an object's outgoing references the fields value is. i.e. if you want to know the value of field class.fields[i] in object x you can get that via x.references[class.fields[i].index] - which I believe can be simplified to x.references[i] (because class.fields[i].index == i. (As mentioned the only exception are variable-sized objects such as arrays, where the <obj>.references can be larger than the number of fields).

polina-c commented 1 year ago

It helps. Thank you.