godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.99k stars 21.17k forks source link

Rebuilding navmesh while rigid bodies are simulating seems to fail sometimes #81181

Closed Wiggan closed 1 year ago

Wiggan commented 1 year ago

Godot version

4.1.stable

System information

Windows10, AMD Ryzen 9 5900X, NVIDIA GeForce RTX 3060 Ti, driver version 30.0.15.1179, forward+.

Issue description

When I simulate boxes falling and let them affect the navmesh, rebuilding the navmesh sometimes renders very strange results.

A code snippet of how I rebake:

extends Node3D

@export var navigation_region: NavigationRegion3D

var baking = false
var queue_bake = false

# Called when the node enters the scene tree for the first time.
func _ready():
    #for cover in get_children():
    #   cover.destroyed.connect(cover_destroyed)
    navigation_region.bake_finished.connect(on_bake_finished)

func cover_destroyed():
    print("Cover destroyed, rebuilding navmesh")
    rebake_nav_mesh()

func rebake_nav_mesh():
    if baking:
        queue_bake = true
    else:
        queue_bake = false
        navigation_region.bake_navigation_mesh()
        baking = true

func on_bake_finished():
    baking = false
    if queue_bake:
        rebake_nav_mesh()

The philosophy is to not cause multiple rebakes at the same time by keeping track of if we are already rebaking.

https://github.com/godotengine/godot/assets/6214882/63e7351a-a11a-45d4-a3a8-c6e90264ee3a

I've added a timer that rebakes in 2hz, and at one point the nav mesh almost disappears.

Steps to reproduce

I have tried and failed to produce a repex. If I let the boxes fall down from the sky, it does not seem to happen.

Minimal reproduction project

Sadly I've not been able to create one.

smix8 commented 1 year ago

You really should avoid "baking" a navigation mesh instantly on random events.

Baking with the NavigationRegion3D involves parsing node data from the SceneTree. The SceneTree is super-brittle when you do this randomly and not deferred (e.g. threading-locks, child-change locks, ...) The parsing also needs to get data from the PhysicsServer or RenderingServer which might be thread-locked at this point.

If I should guess the SceneTree is blocking the parsing operation and as a result the source geometry data used in the navigation mesh baking process is just empty or corrupted hence the new navigation mesh is empty or broken.

Baking a navigation mesh is a deferred and slow process never intended to be rapid-fired all the time. It might only seem to work because your game is currently tiny and the baking is still fast. It is also a waste of good processing power as the NavigationServer3D only syncs once each physics frame so all the other rapid-fire bakes are just ignored and do nothing.

Wiggan commented 1 year ago

Hi and thank you for the response. As far as I've read, it is per default run on a separate thread and can result in lots of sync, but should still render predictable results don't you think? Otherwise it should at the very least report an error, but the log is clean. What would be the recommended approach to destructible obstacles, if you would be so kind?

smix8 commented 1 year ago

The navigation mesh baking in the Editor is single-threaded. The navigation mesh baking at runtime is by defaul multi-threaded if it is still enabled in the ProjectSettings (default).

That the baking is threaded should be mostly irrelevant because the problematic parsing is always single-threaded as the SceneTree is not thread-safe. If a navigation mesh resource is already baking it blocks further bakes until completed.

Without an MRP I can not say for certain what causes this in your project.

It is very likely that this is not a bug in the navigation mesh baking pipeline and more a bug from not receiving correct source geometry data from the physics or rendering objects and their servers. In that case the navigation mesh baking does not know that the data is not good so you wouldn't get any error out of it.

For your project I would limit the update of the navigation mesh to max 1 update every main-loop instead of linking it directly to a signal. E.g. use a dirty flag that _process() picks up and if a change occurred use a single deferred call to the bake function. This makes sure that the source geometry parsing runs at a thread-safe moment for the SceneTree with no server locked down.

In general changing the navigation mesh too rapidly every frame will give you problems with agents and their pathfinding. I would say no game, not even fast paced, have a real reason to rebake a navigation mesh more than 1-2 times in a second.

Wiggan commented 1 year ago

Thanks! So the recommended method is still rebaking then. In this video I did it in 2hz and not on events. Call_deferred was a good idea, I'll try that. If that fixes the problem, maybe a classic warning a la CollisionShape.set_disabled() could do the trick.

smix8 commented 1 year ago

While I see from the video that things can happen I now baked ten thousand of navigation meshes from all kind of physics shapes or rendering meshes in rapid repetition without a single issue like that. So without a MRP that allows to reproduce this reported issue reliably and confirm that it still exists in Godot master there is little that can be done here.

If Wiggan or someone else that reads this can create a MRP to reproduce the issue let me know so we can reopen but for now I am closing this issue because I have not enough information to do anything with it.

Wiggan commented 1 year ago

I had success after making sure to rebake deferred, so I think adding a warning if not called deferred would be a good solution!

smix8 commented 1 year ago

@Wiggan When did you rebake without deferred? Because the NavigationRegion3D.bake_navigation_mesh() already has an error warning when not using the main thread and also makes sure that the finished baked navigation mesh is set with the main thread. That is also the reason why I gave up without MRP because the only for me plausible source would be SceneTree thread use but that is blocked and warned already. I did not get any error or faced the same problem when I tried to recreate your example script and scene.

Wiggan commented 1 year ago

Hi and thank you for the response. As far as I've read, it is per default run on a separate thread and can result in lots of sync, but should still render predictable results don't you think? Otherwise it should at the very least report an error, but the log is clean. What would be the recommended approach to destructible obstacles, if you would be so kind?

Godot 4.1Stable is what I reported. After your tip about doing call_deferred I never saw any problems any more. Nothing at all in the log when it happened for me. I have not tested if there is an error or warning for it in 4.1.1.