Zylann / godot_voxel

Voxel module for Godot Engine
MIT License
2.63k stars 247 forks source link

Hot CPU spin removing noise texture from voxel instancer #422

Open rcorre opened 2 years ago

rcorre commented 2 years ago

Godot: 4.0.alpha.custom_build.8243c7ab5 Voxel: 31a7ddbf838572e50415159a56720275f9523262

When I try to remove the noise texture from the voxel instancer in the attached project, Godot consumes 100% CPU for around a minute, then crashes.

It prints messages like:

Failed method: SceneTreeEditor::_warning_changed
TOTAL BYTES: 4194288
NULL count: 0
CALL EditorPropertyResource::_update_callback: 1
CALL SceneTreeEditor::_warning_changed: 9733
CALL SceneTreeEditor::_warning_changed: 9733
CALL SceneTreeEditor::_warning_changed: 9733
CALL SceneTreeEditor::_warning_changed: 9733
CALL SceneTreeEditor::_warning_changed: 9732
CALL SceneTreeEditor::_warning_changed: 9732
CALL SceneTreeEditor::_warning_changed: 9732
CALL SceneTreeEditor::_warning_changed: 9732
CALL Node3DEditor::_request_gizmo: 9516
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL SceneTreeEditor::_test_update_tree: 1
CALL RichTextLabel::_update_callback: 1
ERROR: Message queue out of memory. Try increasing 'memory/limits/message_queue/max_size_kb' in project settings.

I sampled a few stacks throughout, they tend to have HashMap/MessageQueue near the top:

(gdb) bt
#0  Callable::operator String (this=this@entry=0x7ffff066f598) at core/variant/callable.cpp:239
#1  0x000055555a2f1a6b in HashMap<Callable, int, HashMapHasherDefault, HashMapComparatorDefault<Callable>, DefaultTypedAllocator<HashMapElement<Callable, int> > >::_hash (p_key=..., this=0x7fffffffba80) at ./core/templates/hash_map.h:84
#2  HashMap<Callable, int, HashMapHasherDefault, HashMapComparatorDefault<Callable>, DefaultTypedAllocator<HashMapElement<Callable, int> > >::_lookup_pos (r_pos=<synthetic pointer>: 0, p_key=..., this=0x7fffffffba80) at ./core/templates/hash_map.h:106
#3  HashMap<Callable, int, HashMapHasherDefault, HashMapComparatorDefault<Callable>, DefaultTypedAllocator<HashMapElement<Callable, int> > >::has (p_key=..., this=0x7fffffffba80) at ./core/templates/hash_map.h:310
#4  MessageQueue::statistics (this=0x55555d27b000) at core/object/message_queue.cpp:159
#5  0x000055555a2f365e in MessageQueue::push_callablep (this=0x55555d27b000, p_callable=..., p_args=p_args@entry=0x7fffffffbd30, p_argcount=p_argcount@entry=1, p_show_error=p_show_error@entry=true) at core/object/message_queue.cpp:121
#6  0x000055555a2ff00d in Object::emit_signalp (this=this@entry=0x555560f98ab0, p_name=..., p_args=p_args@entry=0x7fffffffbd30, p_argcount=p_argcount@entry=1) at core/object/object.cpp:987
#7  0x0000555558584200 in Object::emit_signal<Node*> (p_name=..., this=0x555560f98ab0) at ./core/object/object.h:817
#8  Node::update_configuration_warnings (this=0x55557c2d0ae0) at scene/main/node.cpp:2733
#9  Node::update_configuration_warnings (this=this@entry=0x55557c2d0ae0) at scene/main/node.cpp:2727
#10 0x0000555558a648d9 in RigidDynamicBody3D::_notification (this=this@entry=0x55557c2d0ae0, p_what=p_what@entry=44) at scene/3d/physics_body_3d.cpp:606
#11 0x000055555636163b in RigidDynamicBody3D::_notificationv (p_reversed=false, p_notification=44, this=0x55557c2d0ae0) at ./scene/3d/physics_body_3d.h:133
#12 zylann::voxel::VoxelInstancerRigidBody::_notificationv (this=0x55557c2d0ae0, p_notification=44, p_reversed=<optimized out>) at modules/voxel/terrain/instancing/voxel_instancer_rigidbody.h:11
#13 0x000055555a2f54a4 in Object::notification (this=0x55557c2d0ae0, p_notification=44, p_reversed=<optimized out>) at core/object/object.cpp:737
#14 0x0000555558a396c0 in Node3D::set_transform (p_transform=..., this=<optimized out>) at scene/3d/node_3d.cpp:268
#15 0x0000555556354992 in zylann::voxel::VoxelInstancer::update_block_from_transforms (this=0x55557551d130, block_index=<optimized out>, transforms=..., grid_position=..., layer=..., item_base=..., layer_id=1, world=..., block_transform=...)
    at modules/voxel/terrain/instancing/voxel_instancer.cpp:975
#16 0x0000555556356858 in zylann::voxel::VoxelInstancer::regenerate_layer (this=<optimized out>, layer_id=<optimized out>, regenerate_blocks=<optimized out>) at modules/voxel/terrain/instancing/voxel_instancer.cpp:575
#17 0x000055555635709f in zylann::voxel::VoxelInstancer::on_library_item_changed (this=0x55557551d130, item_id=<optimized out>, change=<optimized out>) at modules/voxel/terrain/instancing/voxel_instancer.cpp:650
#18 0x000055555633d64d in non-virtual thunk to zylann::voxel::VoxelInstanceLibrary::on_library_item_changed(int, zylann::voxel::VoxelInstanceLibraryItem::ChangeType) () at modules/voxel/terrain/instancing/voxel_instance_library.h:54
#19 0x0000555556341bd4 in zylann::voxel::VoxelInstanceLibraryItem::notify_listeners (this=0x55557147bc10, change=zylann::voxel::VoxelInstanceLibraryItem::CHANGE_GENERATOR) at modules/voxel/terrain/instancing/voxel_instance_library_item.cpp:78
#20 0x000055555a2ff2b2 in Object::emit_signalp (this=<optimized out>, p_name=..., p_args=p_args@entry=0x0, p_argcount=p_argcount@entry=0) at core/object/object.cpp:992
#21 0x0000555559f9e8e8 in Object::emit_signal<>(StringName const&) (p_name=..., this=<optimized out>) at ./core/variant/variant.h:211
#22 Resource::emit_changed (this=<optimized out>) at core/io/resource.cpp:44
#23 0x00005555562f5e07 in call_with_variant_args_helper<__UnexistingClass, Ref<Noise>, 0ul> (r_error=..., p_args=<synthetic pointer>, p_method=<optimized out>, p_instance=<optimized out>) at ./core/variant/binder_common.h:262
#24 call_with_variant_args_dv<__UnexistingClass, Ref<Noise> > (default_values=..., r_error=..., p_argcount=<optimized out>, p_args=<optimized out>, p_method=<optimized out>, p_instance=<optimized out>) at ./core/variant/binder_common.h:409
#25 MethodBindT<Ref<Noise> >::call (this=<optimized out>, p_object=<optimized out>, p_args=<optimized out>, p_arg_count=<optimized out>, r_error=...) at ./core/object/method_bind.h:320
#26 0x000055555a2d8fcf in ClassDB::set_property (p_object=p_object@entry=0x55557340e360, p_property=..., p_value=..., r_valid=r_valid@entry=0x0) at core/object/class_db.cpp:1049
#27 0x000055555a300a41 in Object::set (this=this@entry=0x55557340e360, p_name=..., p_value=..., r_valid=r_valid@entry=0x0) at core/object/object.cpp:248
#28 0x000055555a32ee82 in UndoRedo::_process_operation_list (this=<optimized out>, E=0x55557fe89740) at core/object/undo_redo.cpp:341
#29 0x000055555a32f5a9 in UndoRedo::_redo (this=0x5555610e85f0, p_execute=<optimized out>) at core/object/undo_redo.cpp:75
#30 0x000055555a32f774 in UndoRedo::commit_action (p_execute=true, this=0x5555610e85f0) at core/object/undo_redo.cpp:296
#31 UndoRedo::commit_action (this=0x5555610e85f0, p_execute=p_execute@entry=true) at core/object/undo_redo.cpp:283
#32 0x00005555576a2f1e in EditorInspector::_edit_set (this=0x55556f0b07b0, p_name=..., p_value=..., p_refresh_all=<optimized out>, p_changed_field=...) at editor/editor_inspector.cpp:3449
#33 0x00005555576a345c in EditorInspector::_property_changed (this=0x55556f0b07b0, p_path=..., p_value=..., p_name=..., p_changing=<optimized out>, p_update_all=<optimized out>) at editor/editor_inspector.cpp:3466
#34 0x00005555576b5390 in call_with_variant_args_helper<EditorInspector, String const&, Variant const&, String const&, bool, bool, 0ul, 1ul, 2ul, 3ul, 4ul> (r_error=..., p_args=<optimized out>, p_method=<optimized out>, p_instance=<optimized out>)
    at ./core/variant/binder_common.h:262
#35 call_with_variant_args<EditorInspector, String const&, Variant const&, String const&, bool, bool> (r_error=..., p_argcount=<optimized out>, p_args=<optimized out>, p_method=<optimized out>, p_instance=<optimized out>) at ./core/variant/binder_common.h:376
#36 CallableCustomMethodPointer<EditorInspector, String const&, Variant const&, String const&, bool, bool>::call (this=<optimized out>, p_arguments=<optimized out>, p_argcount=<optimized out>, r_return_value=..., r_call_error=...)
    at ./core/object/callable_method_pointer.h:104
#37 0x000055555a090a07 in CallableCustomBind::call (this=0x55558004d350, p_arguments=<optimized out>, p_argcount=4, r_return_value=..., r_call_error=...) at core/variant/callable_bind.cpp:99
#38 0x000055555a2ff2b2 in Object::emit_signalp (this=this@entry=0x555580045cc0, p_name=..., p_args=p_args@entry=0x7fffffffcc80, p_argcount=p_argcount@entry=4) at core/object/object.cpp:992
#39 0x00005555576793bf in EditorProperty::emit_changed (this=0x555580045cc0, p_property=..., p_value=..., p_field=..., p_changing=<optimized out>) at editor/editor_inspector.cpp:119
#40 0x00005555576883d6 in EditorProperty::gui_input (this=0x555580045cc0, p_event=...) at editor/editor_inspector.cpp:676
#41 0x00005555585e5fe6 in Viewport::_gui_call_input (this=0x555560f95990, p_control=<optimized out>, p_input=...) at scene/main/viewport.cpp:1288
#42 0x00005555585ec547 in Viewport::_gui_input_event (this=0x555560f95990, p_event=...) at ./core/object/ref_counted.h:215
#43 0x0000555558601c7a in Viewport::push_input (this=this@entry=0x555560f95990, p_event=..., p_local_coords=p_local_coords@entry=false) at scene/main/viewport.cpp:2730
#44 0x00005555586219f2 in Window::_window_input (this=0x555560f95990, p_ev=...) at scene/main/window.cpp:1020
#45 0x000055555862dee6 in call_with_variant_args_helper<Window, Ref<InputEvent> const&, 0ul> (r_error=..., p_args=<optimized out>, p_method=<optimized out>, p_instance=<optimized out>) at ./core/variant/binder_common.h:262
#46 call_with_variant_args<Window, Ref<InputEvent> const&> (r_error=..., p_argcount=<optimized out>, p_args=<optimized out>, p_method=<optimized out>, p_instance=<optimized out>) at ./core/variant/binder_common.h:376
#47 CallableCustomMethodPointer<Window, Ref<InputEvent> const&>::call (this=<optimized out>, p_arguments=<optimized out>, p_argcount=<optimized out>, r_return_value=..., r_call_error=...) at ./core/object/callable_method_pointer.h:104
#48 0x00005555560bd796 in DisplayServerX11::_dispatch_input_event (this=0x55555d6a1530, p_event=...) at platform/linuxbsd/display_server_x11.cpp:3184
#49 0x000055555a02706e in Input::_parse_input_event_impl (this=<optimized out>, p_event=..., p_is_emulated=<optimized out>) at core/input/input.cpp:663
#50 0x000055555a029388 in Input::flush_buffered_events (this=0x55555d5ed340) at ./core/templates/list.h:115
#51 0x00005555560c0ef7 in DisplayServerX11::process_events (this=0x55555d6a1530) at platform/linuxbsd/display_server_x11.cpp:4173
#52 0x00005555560a22a8 in OS_LinuxBSD::run (this=this@entry=0x7fffffffd6a0) at platform/linuxbsd/os_linuxbsd.cpp:533
#53 0x0000555556093e93 in main (argc=<optimized out>, argv=0x7fffffffdbd8) at platform/linuxbsd/godot_linuxbsd.cpp:72

example.zip

Zylann commented 2 years ago

Having a look.

Notes:

It's hard to tell yet at first what is going on but it seems like Godot is simply very badly handling the fact many rigidbody transforms are set at once in the editor (they are created for collisions). It tries to update configuration warnings of ALL the bodies which in turn is emitting signals which go to message queue and it's a callback-fest of deferred stuff going into the message queue, ultimately leading to... nothing, because these nodes are not editable in the scene tree anyways...

There are many bodies because when you remove the noise, the instancer will no longer attempt to filter density anymore. So it will spawn a lot more instances with collision in every chunk, so you'll end up with hundreds of physics bodies in your scene. While the physics engine might be able to handle this, the editor however is doing really inefficient things under the hood that kill performance.

I tried bumping the size of the message queue to 400Mb instead of the default 4Mb to see where it leads. Now removing the noise takes a second with no issues. image

I had a look at Node::update_configuration_warnings and it seems like it keeps emitting signals even if the node cannot be seen in the editor. I tried to change it, and it also fixed the issue:

#ifdef TOOLS_ENABLED
inline bool can_node_be_seen_in_editor(Node &node) {
    if (!node.is_inside_tree()) {
        return false;
    }
    const Node *edited_scene_root = node.get_tree()->get_edited_scene_root();
    if (edited_scene_root == nullptr) {
        return false;
    }
    if (edited_scene_root == &node) {
        return true;
    }
    // Not sure why `is_ancestor_of` is required
    return edited_scene_root->is_ancestor_of(&node) && node.get_owner() == edited_scene_root;
}
#endif

void Node::update_configuration_warnings() {
#ifdef TOOLS_ENABLED
    if (can_node_be_seen_in_editor(*this)) {
        get_tree()->emit_signal(SceneStringNames::get_singleton()->node_configuration_warning_changed, this);
    }
#endif

Using signals like this isn't great though IMO. Because if these bodies happened to be visible instead (which is possible in other games with large scenes, who knows), the problem will also happen as soon as you do a bulk move. A better solution would be to store a version number which the scene tree dock can poll on visible nodes (never more than a few dozens). When it compares different to the last shown, update it. No faffing around with signals. But that sounds like a larger change to make.

rcorre commented 2 years ago

your instances.tres contains images that went embedded directly inside the file as text. This will blow up the size of the resource (14Mb) and makes it slower to load. Maybe try not creating new ImageTextures from the inspector? It also contains mesh data. Or could it be that you tried to "make unique"? I would also advise against that on a mesh. I could not find any NoiseTexture in the entire project. I guess it is the images you created? Or FastNoiseLite?

Sorry for not clarifying. Here's what I did to get to this point:

I never deliberately created an ImageTexture or made anything unique. I'm guessing the mesh data is just how "Update from Scene" works, right? It doesn't seem to reference the original scene. As far as the noise textures go, are you always supposed to create them as separate resources? To me clicking "New FastNoiseLite" on the "noise" slot of the generator seemed like a pretty "normal" way to edit an instance generator.

Thanks for tracking down the godot issue! I'll try bumping the queue size for now.

Zylann commented 2 years ago

remembered I need to save the VoxelInstanceLibrary as a separate resource for now

You dont need to do that. The workaround is to right-click on the resource property and choose Edit, which will open it in a full inspector instead of a sub-inspector. The sub-inspector is why the issue occurs. But it's still nice to save this as its own file, because it can be a large resource.

I'm guessing the mesh data is just how "Update from Scene" works, right?

Not really, it wasn't intented. It could be that your scene originally had this ImageTexture inside of it? I think the issue is because Godot imported your .glb as a .scn under the hood, but decided to embed everything inside of it. As a result, since the instancer conversion only needed the mesh (so it could give it to MultiMesh), Godot found that this mesh had no file of its own (since it came built-in inside a scene), and so it made a copy of it. Same story with materials and textures inside of them. This completely sucks but I'm not sure how it could be improved. Having it automatically update if the scene changes would be desirable as well, so perhaps it needs to keep a reference to the scene and convert at runtime...

rcorre commented 2 years ago

I'm not sure if this is the same issue a different one. I bumped the queue size to 409600. When I open my world scene, the CPU is relatively low at idle (< 5%).

When I select the VoxelLodTerrain or the VoxelInstancer in the editor, CPU usage climbs to 100%. If I select another node, the CPU drops to near-zero again. example.zip

It doesn't require me to change or regenerate anything, and the camera is positioned exactly the same. It's just a matter of whether or not the VoxelLodTerrain is the selected node in the editor.

https://user-images.githubusercontent.com/2496231/184543896-61273c2d-3e48-4317-a2e6-730a6958166e.mp4

Zylann commented 2 years ago

If it doesn't freeze, I'm not sure what could cause all CPU cores to be used at 100% by just having either selected... it is possible for the module to use that much CPU if it has a lot of tasks running AND its threading configuration is changed to use the maximum, but by default it uses half only, and there doesn't seem to be any task running in your video (camera doesnt move and terrain likely finished loading a while ago). I also haven't noticed this on my computer. The only thing the module does when nodes are selected is to update realtime gizmos (which is done on the main thread only and is usually very cheap), which however dont seem to be enabled in your video. I'm curious where the CPU is spending its time here.

rcorre commented 2 years ago

I can keep looking into it. It does seem to be a separate issue -- https://github.com/godotengine/godot/pull/64363 fixes the OP for me, but not this new issue.

rcorre commented 2 years ago

all CPU cores to be used at 100%

Just in case you don't know, 100% on Linux (at least in top) is 1 core. All cores would be 1200% on my CPU.

rcorre commented 2 years ago

perf sample:

    20.18%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] Node3D::get_transform
    19.40%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] __dynamic_cast
    16.92%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] Node::get_child_count
    10.12%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] __cxxabiv1::__si_class_type_info::__do_dyncast
     9.14%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] Node3DEditorViewport::_calculate_spatial_bounds
     4.15%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff2b
     2.67%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015fef0
     2.13%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015fef4
     2.09%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff31
     1.87%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff17
     1.64%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff21
     1.27%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff2f
     1.24%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff27
     1.13%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000015ff13
     1.09%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] Node::propagate_notification
     0.81%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] Node::get_child
     0.77%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] __cxxabiv1::__vmi_class_type_info::__do_dyncast
     0.69%  godot.linuxbsd.  godot.linuxbsd.opt.tools.64  [.] Object::notification
     0.24%  godot.linuxbsd.  libc.so.6                    [.] 0x000000000016023f
Zylann commented 2 years ago

Just in case you don't know, 100% on Linux (at least in top) is 1 core. All cores would be 1200% on my CPU.

In which case this is less concerning, but I'm still not aware of any task in the module requiring a full core to run constantly while a node is selected.

Not sure how to interpret your sampling, but it doesn't indicate a place in the module. An hypothesis could be that Godot is trying to figure out the bounding box of the selected node (every frame, for some reason) based on all its children, and since there is a fuckton of them, takes a while to do so. All this probably to show a bounding box around the selection maybe. I wouldn't think of this being slow even with hundreds of nodes, but maybe the operation isn't well optimized to begin with.

rcorre commented 2 years ago

Sorry, that's not too helpful as is, I was running an optimized build. I might be able to get a call stack if I recompile. Would you like me to open a separate issue? I think the OP can be closed with your fix to Godot.

Zylann commented 2 years ago

I'm not sure if that can be considered an issue though. It's just the Godot Editor doing something here which happens to be very slow. I have no control over this. Or I would have to migrate all these nodes to use servers directly, which however prevents some features from working easily such as obtaining the collider in game or displaying debug collision shapes.

19.40% godot.linuxbsd. godot.linuxbsd.opt.tools.64 [.] dynamic_cast 16.92% godot.linuxbsd. godot.linuxbsd.opt.tools.64 [.] Node::get_child_count 10.12% godot.linuxbsd. godot.linuxbsd.opt.tools.64 [.] cxxabiv1::si_class_type_info::do_dyncast

These make me sad. get_child_count() should just return a field. Why was it sampled so many times? Maybe it's directly called inside for loops and the compiler wasnt able to optimize it? dynamic_cast is hard to avoid because in Godot the scene tree can contain random stuff, not just 3D nodes. But a possible optimization is to have a is_node3d shortcut virtual method which would likely be faster than a full-blown dynamic cast. As for get_transform, depends what the node does, but I'd expect it to be only a field return as well. But maybe it's doing more.

rcorre commented 2 years ago

It's just the Godot Editor doing something here which happens to be very slow. I have no control over this

It probably is, though so far I can only repro it with voxel_tools. I tried adding 1000s of collisionshape children to a body, and adding 1000s of instances to a MultiMesh, and neither reproes.

After playing around some more, it seems this happens whenever I select the VoxelInstancer or any node which is a parent of the VoxelInstancer (the terrain or the root node). If I select a sibling, or a child of the VoxelInstancer, CPU returns to normal.

Here's an unoptimized profile with call stacks:

profile.txt

Zylann commented 2 years ago

That makes sense because VoxelInstancer is the only one which actually creates child nodes for colliders. Everything else uses servers directly. But apart from instancing these nodes, it doesn't do anything else to them, they are static. You tried 1000 but maybe there are more. Also, they are bodies, not just collision shapes.

The call stack indeed confirms the 3D spatial editor is computing a bounding box from scratch every frame.

In the present use case it's also mostly pointless to compute that bounding box, but that cannot be prevented.

rcorre commented 2 years ago

I filed https://github.com/godotengine/godot/issues/64398. I could repro it, but it took 9000 CollisionShape children to get to ~30% CPU usage. Is it a goal to eventually have VoxelInstancer use servers instead of nodes?

Zylann commented 2 years ago

I'd like to add some UI to display how many instances are present in the scene so we can get a sense of numbers with VoxelInstancer. It was a choice for me to use nodes here because it allows to get a collider reference more easily. If I change it to use physics server, returned colliders will always be "VoxelInstancer" without information about which instance exactly was collided with. If there was a way to trace back to the instance it could be an alternative, but I'm not sure if that's possible, PhysicServer wants a single ObjectID. Another option is to reduce colliders in a radius and dynamically spawn/despawn them (in addition to the whole LOD system which already does that on a larger scale), but that comes with other issues such as non-player entities no longer being able to collide when far enough (also not mentionning multiplayer), and it also makes the code more complex, all for an issue that so far seems to only happen in the Editor.

Zylann commented 2 years ago

With noise, there was 5,600 instances. Without noise, there was 11,168. Note: if instead of LOD index 2 I choose LOD index 1, and reduce density to 0.03 instead of 0.1, it goes down to 2,613, which is significantly less for a similar result from the point of view of the player.