NVIDIAGameWorks / PhysX

NVIDIA PhysX SDK
Other
3.19k stars 808 forks source link

Raycasts may fail after moving a lot of objects #550

Open francis-page opened 2 years ago

francis-page commented 2 years ago

Hi,

I've been investigating a low occurence bug where raycasts fail after moving a lot of objects around. I think I finally found what is causing it. When switching to the new AABBTree in AABBPruner::commit when a refit happened in the previous frame, the new AABBTree will contain outdated data. What happens:

In the frame where AABBPruner state is BUILD_LAST_FRAME:

In the next frame, where AABBPruner state is BUILD_FINISHED:

I tried to skip the BUILD_LAST_FRAME step and it seems to fix my issue. It's probably not the right fix as that step must have a reason to be.

PierreTerdiman commented 2 years ago

I'll investigate ASAP.

francis-page commented 2 years ago

Merci! (Oh... I remember browsing your Coder Corner with great interest in the early 2000s!)

PierreTerdiman commented 2 years ago

I haven't been able to reproduce this so far. Are you doing anything out of the ordinary for this to happen?

I did various tests like in the attached video, with a lot of objects moving around, and a test raycast against a particular object. The blue lines visualize the AABB tree for the scene, and it's rebuilt approximately every 100 frames. The test raycast never misses the test object, even though it's moving quite fast so any missed "refit" operation on the BVH should be exposed by this.

I suppose you don't have an easy repro for me to look at?

https://user-images.githubusercontent.com/11311181/161495756-6e875413-5224-4ee7-8071-c7e3cefefc33.mp4

francis-page commented 2 years ago

Hi Pierre,

I attached a modified version of the Hello World snippet that will eventually reproduce the issue if you let it run. Just put a breakpoint on line 217 and it will stop when a ray cast fails. I also added this: printf("%p %d\n", this, mProgress); in AABBPruner::buildStep so I could follow the state of the pruners. If you add this, you will see that it always happens when the objects were moved when mProgress value was 5 (BUILD_LAST_FRAME). I also added some code in AABBPruner::commit to output the global bounding box of the current tree. The output when the bug occurs looks like this:

...
000001E01C928200 0
Moving objects to -4603.085938 1386.982300 -540.886414
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 2
000001E01C928200 0
000001E01C922890 3
000001E01C928200 0
000001E01C922890 4
000001E01C928200 0
Moving objects to -4359.113281 1271.340088 -536.499390
000001E01C922890 5
000001E01C922890 AABBPruner::commit switching trees, new world bounds X (-4603.135742..-4353.036133) Y (1386.477295..1387.487305) Z (-540.936401..-290.836548)
000001E01C928200 0

You can see that the new world's bounding box corresponds to the previous position of the objects and not to the new position that was set in the previous frame.

Note that I used static rigid bodies because that's what was used (sorry, I forgot to specify it). In the original project, the objects are never moved except when a specific event occurs and all the objects are moved somewhere else.

SnippetHelloWorld.zip

PierreTerdiman commented 2 years ago

Ok, for PhysX 4 you will need to modify the code in 2 places:

https://github.com/NVIDIAGameWorks/PhysX/blob/c3d5537bdebd6f5cd82fcaf87474b838fe6fd5fa/physx/source/scenequery/src/SqAABBPruner.cpp#L161

https://github.com/NVIDIAGameWorks/PhysX/blob/c3d5537bdebd6f5cd82fcaf87474b838fe6fd5fa/physx/source/scenequery/src/SqAABBPruner.cpp#L198

Add the last part with BUILD_LAST_FRAME there:

if(mProgress==BUILD_NEW_MAPPING || mProgress==BUILD_FULL_REFIT || mProgress==BUILD_LAST_FRAME)

And that should fix the bug.

The bug happened because BUILD_LAST_FRAME was added a long time after the initial code was written, and we forgot to update these two lines.

It went undetected until now (including in my own repro attempt above) because the scene tree contains N objects per leaf node. So the last tested AABB is not actually the AABB around just one moved object, it's the AABB around a group of spatially close objects. As long as the group's bounds still enclose the previous position of the moved object, the bug doesn't appear. It only appears if you teleport everything far away from the previous positions, as in your repro.

Thanks for that one, it was not an obvious bug.

francis-page commented 2 years ago

Thanks for that one, it was not an obvious bug.

Thanks to you! And yeah, it wasn't an easy one! It took me almost two weeks to get a proper repro and eventually isolate the cause.

PierreTerdiman commented 2 years ago

Two weeks is not bad. According to the Perforce history, this bug has been introduced at some point between 2013 and 2014 (I couldn't find exactly when). And nobody ever noticed. So, congratulations :)

K-Tone commented 2 years ago

Will take this into Unity :-)

allengx commented 2 years ago

About this issue || mProgress==BUILD_FINISHED Is it necessary?

because in some places buildStep doesn't commit right away

void SceneQueryManager::sceneQueryBuildStep(PruningIndex::Enum index)
{
    PX_PROFILE_ZONE("SceneQuery.sceneQueryBuildStep", mScene.getContextId());

    if (mPrunerExt[index].pruner() && mPrunerExt[index].type() == PxPruningStructureType::eDYNAMIC_AABB_TREE)
    {
        const bool buildFinished = static_cast<AABBPruner*>(mPrunerExt[index].pruner())->buildStep(false);
        if(buildFinished)
        {
            mPrunerNeedsUpdating = true;
        }
    }
}
allengx commented 1 year ago

@PierreTerdiman Reference francis-page example,add the code to the sceneDesc Settings:

sceneDesc.sceneQueryUpdateMode = PxSceneQueryUpdateMode::eBUILD_ENABLED_COMMIT_DISABLED;

This problem still exists. because in buildStep,BUILD_FINISHED state not commit right away

void SceneQueryManager::afterSync(PxSceneQueryUpdateMode::Enum updateMode)
{
    PX_PROFILE_ZONE("Sim.sceneQueryBuildStep", mScene.getContextId());

    if(updateMode == PxSceneQueryUpdateMode::eBUILD_DISABLED_COMMIT_DISABLED)
    {
        mPrunerNeedsUpdating = true;
        return;
    }

    // flush user modified objects
    flushShapes();

    bool commit = updateMode == PxSceneQueryUpdateMode::eBUILD_ENABLED_COMMIT_ENABLED;

    for(PxU32 i = 0; i<2; i++)
    {
        if(mPrunerExt[i].pruner() && mPrunerExt[i].type() == PxPruningStructureType::eDYNAMIC_AABB_TREE)
            static_cast<AABBPruner*>(mPrunerExt[i].pruner())->buildStep(true);

        if(commit)
            mPrunerExt[i].pruner()->commit();
    }

    //If we didn't commit changes, then the next query must perform that operation so this bool must be set to true, otherwise there should be 
    //no outstanding work required by queries at this time
    mPrunerNeedsUpdating = !commit;
}