godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.12k stars 69 forks source link

Allow snapshotting the ObjectDB to help check for memory leaks #10628

Open AleksLitynski opened 2 weeks ago

AleksLitynski commented 2 weeks ago

Describe the project you are working on

I just finished working on a game jam project (https://tavoe.itch.io/nood-dude-skydiving-league). I've run into similar issues on previous game jam projects.

Describe the problem or limitation you are having in your project

GDScript is not garbage collected, and from time to time I'll leak Nodes or other Objects without realizing it. The most common way I do this is by calling remove_child without calling queue_free.

More broadly, I'm just paranoid I'm leaking objects and have no real way to check.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

I'd like to add a new panel to the debugger that can snapshot the current state of the ObjectDB and help visualize if any objects have been leaked. I come from a web development background, and chrome's developer tools have a similar feature. chrome_heap_snapshots

If you attach a debugger like gdb to your Godot game, it's possible to inspect the content of the ObjectDB and more, but if you're just working within the Godot Editor, there's not much recourse. There is the Node::print_orphan_nodes function, which is definitely a good start, but I think adding a bit of dedicated UI for this use case could be nice.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

To save everyone some time, I implemented my proposal here - https://github.com/AleksLitynski/godot/commit/3cd01b2f2b6a6fa0f291a09a40d8788f5b94d3ef. It needs a lot of cleaning up, but it is fully functional.

The proposed changes are:

  1. Add a new panel to the debugger called "Object Snapshot" where all the new UI can go
  2. Add a button "Take Object Snapshot" to the panel
  3. When the button is pressed, send a request to the game client to dump the state of it's ObjectDB and send it back to the editor
  4. Parse the snapshot, and populate several tabs in the Object Snapshot panel to help the user understand the content of the snapshot.
  5. Have a list of past snapshots so users can diff the snapshots over time.

Screenshots of the UI for clarification:

  1. When the game is not running, the panel is largely empty: non_running_game

  2. When the game is running, the user can press "Take Object Snapshot" to snapshot the ObjectDB: summary_view

  3. I thought of 6 interesting ways to visualize the object snapshot -

    • Summary View: Highlight possible memory issues, like cycles in RefCounted objects or Nodes that aren't part of the main trere.
    • Classes View: Groups the objects by class and presents them as a tree. If the user clicks on an object, displays the snapshotted object in the inspector.
    • Objects View: This is just a list of every object in the game.
    • Nodes View: This is basically the same as the Scene view that's already in the editor, except that it also shows orphaned Node trees.
    • RefCounted View: This shows only RefCounted objects, and walks the object graph to find any cycles. Because many references are held in the engine itself, this isn't fullproof.
    • Raw View: This dumps the snapshot to json so users can copy it and analyse it on their own.

I'm not including a screenshot of every tab, but here's one of the classes view: classes_view

A major drawback to this tool is that only a small subset of the total data in a Godot game is exposed through the ObjectDB. The data in the ObjectDB is the data of most interested to end users, though, so I think a tool like this is reasonable.

If this enhancement will not be used often, can it be worked around with a few lines of script?

I tried to implement this feature as a plugin, and was almost able to. The problem is ObjectDB::debug_objects is not exposed to GDScript or GDExtensions, so I had to make a module. If we exposed ObjectDB::debug_objects to GDExtensions, this proposal could probably just be a plugin.

Additionally, Node::print_orphan_nodes exists, and probably covers 90% of use cases with1% of the code.

Is there a reason why this should be core and not an add-on in the asset library?

This could be an asset if we expose ObjectDB::debug_objects to scripting. I suspect there's a reason we don't already expose that method to scripting, though.

More than that, I think this is the kind of tool a lot of people would find helpful from time to time, but wouldn't go out of their way to install.

timothyqiu commented 2 weeks ago

See also https://github.com/godotengine/godot/pull/83757

mihe commented 2 weeks ago

I think something like this is sorely needed, and would to some degree serve the same need as what Unity has with its built-in memory profiler, which I would encourage anyone tackling this to take a look at for inspiration (see brief overview here).

Personally I think this is important enough that it deserves being a core feature rather than an add-on/extension.

Have a list of past snapshots so users can diff the snapshots over time.

I tried your branch and I'm not seeing any diffing capabilities within the tool itself. Are you suggesting users would diff it "manually" or did I perhaps miss some button somewhere? Being able to select two or more snapshots and see the objects that were added between the first and last snapshots would be an almost must-have for a tool like this, in my opinion, but I suppose that could be added later down the line as well.

Beyond that I have some off-the-cuff remarks:

  1. The "Count" column in the "Classes" view is a bit confusing. I would expect the count for Object to encompass every derived object as well, but instead it seems to only include actual instantiations of Object? Could there be two columns perhaps, like "Count (incl.)" and "Count (excl.)"?
  2. It strikes me as useful to have the accumulated counts in the "RefCounted" view displayed in the top-level items. The same might be true for the "Inbound" and "Outbound" counts in the "Objects" view.
  3. It definitely needs to be possible to sort the views by all the various counts.
  4. I was able to crash the running game while taking a snapshot after having changed the process_mode of a RigidBody3D to PROCESS_MODE_DISABLED (with disable_mode at its default of DISABLE_MODE_REMOVE) through the remote Scene dock.
  5. I get errors saying Cannot get class 'GDScriptFunctionState' (and similar) when taking a snapshot. This was when snapshotting during a GDScript await.
  6. I get errors saying Unicode parsing error, some characters were replaced with � (U+FFFD): NUL character when taking a snapshot. It seems to stem from not casting i to String here.

All-in-all this looks quite polished and well-made, especially for (what seems to be) a first contribution. I would suggest you make this into a draft pull request (seeing as how you have it ready to go already) and the various implementation-specific details/issues can be ironed out there.

AleksLitynski commented 1 week ago

Thanks for talking a look at this!

To answer your questions -

I just meant manually diffing. I'd like to add some kind of visual diff, but I wanted to stop and put up a proposal before I kept working on it. If this generally looks good, I'm happy to go back and add things like visual diffing.

  1. Same deal with summing up the tree items in the classes view. I agree that an inclusive count would make more sense, I just didn't think it would make or break a proposal. I think maybe the inclusive count should go in the tree, then the exclusive count can go in the list on the right after you click a tree item?
  2. That's a good idea.
  3. Yeah. Sorting and filtering would both be good.
  4. Interesting.
  5. Also interesting. I worry about how many edge cases snapshotting will end up having.
  6. Will fix.

I watched the unity video. I see three important things they're doing that I'm not doing right now -

  1. Persisting snapshots to disk. This is probably a good idea, although I'm not sure where they should be saved to. It also means I'll have to think about versioning snapshots across godot versions.
  2. Snapshot diffing. Should be easy to do, and seems very useful.
  3. Measuring objects. This is the big one. The unity profiler is showing how much memory each object is taking up, and helps you know if you've overshot your texture budget. I went down the rabbit hole of doing this in godot, but I worry I don't know enough to implement it correctly. Here's where I got to:
    1. Say you have an ImageTexture. The class has a few properties, like width, height, and an RID for the image it represents. I can measure the ImageTexture with sizeof, but it would be pretty small.
    2. ImageTexture doesn't have any properties, so from the perspective of the ObjectDB, it's empty.
    3. This isn't that bad, because the underlying Image is also an object, so it'll end up in the snapshot too. We will eventually measure the Image, we just won't know it's connected to the ImageTexture. I'm not sure if you noticed this, but when I serialize RefCounteds, I explicitly include the ref count. It's not actually a property of the object, so it's not automatically included in the snapshot. I could add the image's RID to the ImageTexture in the same way, but I worry once I start adding one off properties for random classes, I'll never stop.
    4. The real problem comes when we measure the Image. It stores the actual image data in a Vector<uint8_t>. Vectors are not Objects and they're heap allocated, so the memory is basically invisible to this tool.
    5. Instead of pulling data from the ObjectDB, we could go a layer lower and get it directly from Memory.h. Godot supports pluggable allocators, but we wrap calls to the allocator in the memnew macro. We could probably add some instrumentation to that to track how many instances of each class we've allocated. I'm fairly scared to touch Memory.h, though. Unity is also breaking memory down by resident vs allocated memory. I think we would have to access the allocator's internals to know things like that. More annoying than that, we would just know an array of uint8_t had been allocated. I don't know how we would map that back to the Image that owns it, much less the ImageTexture. The Memory class does expose get_mem_available(), get_mem_usage(), and get_mem_max_usage() functions. It would be easy to include at least that coarse grained data in snapshots.

I can put up a draft pull request with what I have here. I can also break this up into 3 smaller pull requests if that would be easier. One to fix up some Godot internal, another one to add the skeleton of the module, and a third to populate the module with all the different views. Any preference between those two?

mihe commented 1 week ago

I think maybe the inclusive count should go in the tree, then the exclusive count can go in the list on the right after you click a tree item?

I'm not sure I understand what you mean by "go in the tree", but so long as you can readily see and sort by both counts it sounds fine to me.

Persisting snapshots to disk. This is probably a good idea, although I'm not sure where they should be saved to. It also means I'll have to think about versioning snapshots across godot versions.

Yes, persisting to disk would definitely be useful. As for the path, somewhere in the OS.get_user_data_dir() (aka user:// aka %APPDATA%/Godot/app_userdata/[project] on Windows) would seem to make sense to me. That's where you'll find stuff like the logs and shader caches, etc.

Measuring objects. This is the big one. The unity profiler is showing how much memory each object is taking up, and helps you know if you've overshot your texture budget. I went down the rabbit hole of doing this in godot, but I worry I don't know enough to implement it correctly.

I don't speak for the Godot production/core team, but I personally feel like this would be a case of not letting perfection be the enemy of good. Just having the ability to view/diff object counts will be a boon to many bigger projects. Getting detailed breakdowns of the actual memory usage per-object, while of course very useful, can probably wait (again, in my opinion).

The Memory class does expose get_mem_available(), get_mem_usage(), and get_mem_max_usage() functions. It would be easy to include at least that coarse grained data in snapshots.

Yeah, I don't see the harm in this. It could be useful. Maybe display it as part of the summary or something.

I can put up a draft pull request with what I have here. I can also break this up into 3 smaller pull requests if that would be easier. One to fix up some Godot internal, another one to add the skeleton of the module, and a third to populate the module with all the different views. Any preference between those two?

In my experience, and judging by your branch, I would say just make one single PR for everything. The stuff outside of modules/snapshot_debugger isn't all that much, and it streamlines the process and makes it clearer why the stuff outside of modules/snapshot_debugger is being changed.