NVIDIA-Omniverse / PhysX

NVIDIA PhysX SDK
BSD 3-Clause "New" or "Revised" License
2.54k stars 361 forks source link

Crash inside SAP Broad phase when there is too much collision pairs #262

Open Elercia opened 6 months ago

Elercia commented 6 months ago

Library and Version & Operating System

PhysX v5.0 Window 10

Steps to Trigger Behavior

I'm sorry but I do not have a clear repro steps to give you. I only have a callstack:

[Inline Frame] PhysX64_r.dll!physx::PxMemCopy(void *) Line 85   C++
[Inline Frame] PhysX64_r.dll!physx::Bp::resizeBroadPhasePairArray(const unsigned int scratchAllocator, const unsigned int elements, physx::PxcScratchAllocator *) Line 1144 C++
PhysX64_r.dll!physx::Bp::BroadPhaseSap::batchUpdate(const unsigned int Axis, physx::Bp::BroadPhasePair * & pairs, unsigned int & pairsSize, unsigned int & pairsCapacity) Line 1303 C++
PhysX64_r.dll!physx::Bp::BroadPhaseBatchUpdateWorkTask::runInternal() Line 691  C++
[Inline Frame] PhysX64_r.dll!physx::Bp::BroadPhaseSap::update() Line 705    C++
PhysX64_r.dll!physx::Bp::BroadPhaseSap::update(physx::PxcScratchAllocator * scratchAllocator, const physx::Bp::BroadPhaseUpdateData & updateData, physx::PxBaseTask * __formal) Line 498    C++
PhysX64_r.dll!physx::Bp::AABBManager::updateBPSecondPass(unsigned int __formal, physx::PxcScratchAllocator * scratchAllocator, physx::PxBaseTask * continuation) Line 1561  C++
PhysX64_r.dll!physx::Sc::Scene::updateBroadPhase(physx::PxBaseTask * continuation) Line 3113    C++
PhysX64_r.dll!physx::Cm::DelegateTask<physx::Sc::Scene,&physx::Sc::Scene::updateBroadPhase>::run() Line 103 C++

All I know is there is a wrong setup inside one of our scene. The code seems to have moved a lot of objects to the same position resulting to a crash inside BroadPhaseSap::batchUpdate. BatchUpdate calls resizeBroadPhasePairArray when it needs to add collision pairs. The problem is that in our dump we have 536 870 912 pairs to alloc (newMaxNumPairs value) resulting in an allocation of 4 294 967 296 bytes that is 0 in uint32.

Expected Behavior

As said in the documentation here there is a maximum number of contacts that is possible to process. I understand that broad phase could possibly output more pairs but I believe that there is a limit that can be implemented to discard pairs above a certain point.

You previously suggested in #222 or #226 that we might want to reduce the shape count per actor & do not move this much actors to the same location. Since there is a clear bug inside of our code, I don't think both of these solutions are applicable. But I believe the code could output an error & drop collisions instead of crashing. Also, running in debug mode to have warnings & asserts is not suitable since the crash is in an application we are shipping to various people working on the project.

Is it possible to limit the number of output pairs of the broad phase and have an error telling something went wrong? Thank you for the support!

Actual Behavior

SAP broad phase have no limit of pair output count.

preist-nvidia commented 6 months ago

Thank you @Elercia - we'll check your suggestion and get back to you. Internal tracking: PX-4920.

PierreTerdiman commented 6 months ago

I don't think there is currently a way to prevent this sort of problems. 536 870 912 pairs is too much, and even if we prevent the crash in the broadphase(s) itself, it is likely that some other parts of the pipeline will run out of memory as well immediately afterwards (I am thinking of the "interaction" objects in the "SimulationController" layer for example: they also allocate per-pair object and these ones are much larger than the ones from the broadphase).