NVIDIAGameWorks / PhysX-3.4

NVIDIA PhysX SDK 3.4
https://www.nvidia.com/
2.34k stars 275 forks source link

PhysX 3.4 crash on Android #113

Open palerzhang opened 4 years ago

palerzhang commented 4 years ago

I got a crash inside NpScene::fetchResults on Android. The call stack is

Operating system: Android
                  0.0.0 Linux 4.9.148 #1 SMP PREEMPT Wed Jun 26 04:38:26 CST 2019 armv8l
CPU: arm
     ARMv1 ARM part(0x4100d0b0) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
     8 CPUs

GPU: UNKNOWN

Crash reason:  SIGSEGV /0x00000000
Crash address: 0x95880000
Process uptime: not available

Thread 29 (crashed)
 0  libclient.so!physx::Sq::AABBPruner::updateObjectsAndInflateBounds(unsigned int const*, unsigned int const*, physx::PxBounds3 const*, unsigned int) [PsUnixNeonInlineAoS.h : 524 + 0x0]
     r0 = 0x9587fff4    r1 = 0x00000001    r2 = 0xb9350eb0    r3 = 0x9587e800
     r4 = 0x00000001    r5 = 0xb1393474    r6 = 0xb1393464    r7 = 0xc728d700
     r8 = 0xb1393470    r9 = 0xb48ef700   r10 = 0xb9350ea8   r12 = 0x92906400
     fp = 0xb1393460    sp = 0xc728d690    lr = 0xc728d6c0    pc = 0xc4677c18
    Found by: given as instruction pointer in context
 1  libclient.so!physx::Sq::DynamicBoundsSync::sync(unsigned int const*, unsigned int const*, physx::PxBounds3 const*, unsigned int) [SqSceneQueryManager.cpp : 495 + 0x9]
     r4 = 0xc4677bb1    r5 = 0xc6dc4fd4    r6 = 0x00000002    r7 = 0xc452d9d1
     r8 = 0xc441f449    r9 = 0xc6dc4fd4   r10 = 0xb7c0a020    fp = 0xc728d77c
     sp = 0xc728d718    pc = 0xc452d9e3
    Found by: call frame info
 2  libclient.so!physx::Sc::SqBoundsManager::syncBounds(physx::Sc::SqBoundsSync&, physx::Sc::SqRefFinder&, physx::PxBounds3 const*, unsigned long long) [ScSqBoundsManager.cpp : 112 + 0x5]
     r4 = 0xbc6733c0    r5 = 0x00000000    r6 = 0xbf819648    r7 = 0xc452d9d1
     r8 = 0xc441f449    r9 = 0xc6dc4fd4   r10 = 0xb7c0a020    fp = 0xc728d77c
     sp = 0xc728d730    pc = 0xc455f745
    Found by: call frame info
 3  libclient.so!physx::Sc::Scene::syncSceneQueryBounds(physx::Sc::SqBoundsSync&, physx::Sc::SqRefFinder&) [ScScene.cpp : 3659 + 0x7]
     r4 = 0xc6dc3800    r5 = 0x00000000    r6 = 0xc6dc3820    r7 = 0xc441b871
     r8 = 0x00000000    r9 = 0x00000001   r10 = 0x0000000a    fp = 0xc81d3000
     sp = 0xc728d760    pc = 0xc454fb15
    Found by: call frame info
 4  libclient.so!physx::NpScene::fetchResultsPostContactCallbacks() [NpScene.cpp : 2273 + 0x5]
     r4 = 0xc6dc3800    r5 = 0xc6dc3820    r6 = 0xc6dc3820    r7 = 0xc441b871
     r8 = 0x00000000    r9 = 0x00000001   r10 = 0x0000000a    fp = 0xc81d3000
     sp = 0xc728d778    pc = 0xc441bb8d
    Found by: call frame info
 5  libclient.so!physx::NpScene::fetchResults(bool, unsigned int*) [NpScene.cpp : 2308 + 0x5]
     r4 = 0xc6dc3800    r5 = 0x00000000    r6 = 0xc6dc3820    r7 = 0xc441b871
     r8 = 0x00000000    r9 = 0x00000001   r10 = 0x0000000a    fp = 0xc81d3000
     sp = 0xc728d798    pc = 0xc441bd39
    Found by: call frame info

The Android SDK level is 28 and the OS is Android 9.0. The last call (line 0) is in file PsUnixNeonInlineAoS.h but I can not find this file in PhysX 3.4 sources. It seems like a really rare kind of crash as I can not find anything useful from Internet.

AlesBorovicka commented 4 years ago

This could happen if some nan values went into the simulation or there could be an object falling to infinity maybe? Have you please tried to run CHECKED configuration, it does include NAN checks for the input functions.

palerzhang commented 4 years ago

This could happen if some nan values went into the simulation or there could be an object falling to infinity maybe? Have you please tried to run CHECKED configuration, it does include NAN checks for the input functions.

Thanks for your reply. Unfortunately, it is non-reproducible and it happened rarely among our customers. But I will try the way that you proposed.

By the way, I re-checked the codes of physx::Sq::DynamicBoundsSync::sync that we used and found that we used a much lower version of PhysX 3.4. So I upgrade the version of codes to see if this happens again among our customers.

palerzhang commented 4 years ago

This could happen if some nan values went into the simulation or there could be an object falling to infinity maybe? Have you please tried to run CHECKED configuration, it does include NAN checks for the input functions.

Unfortunately, we are unable to upgrade the PhysX version yet for some reasons. But luckily I found the file PsUnixNeonInlineAoS.h and the code at the top of call stack. The crash occurred when loading floats with neon interface vld1q_f32 (first V4LoadU in function inflateBounds in SqBounds.h).

PX_FORCE_INLINE void inflateBounds(PxBounds3& dst, const PxBounds3& src)
{
    using namespace physx::shdfnd::aos;

    const Vec4V minV = V4LoadU(&src.minimum.x);
    const Vec4V maxV = V4LoadU(&src.maximum.x);
    const Vec4V eV = V4Scale(V4Sub(maxV, minV), FLoad(0.5f * 0.01f));

    V4StoreU(V4Sub(minV, eV), &dst.minimum.x);
    PX_ALIGN(16, PxVec4) max4;
    V4StoreA(V4Add(maxV, eV), &max4.x);
    dst.maximum = PxVec3(max4.x, max4.y, max4.z);
}

As the crash reason is SIGSEGV, I wonder if there's invalid address was accessed. However, it seems that &src.minimum.x is a valid address. So I go back to find where the parameter src comes from. In ScScene.cpp:

void Sc::Scene::syncSceneQueryBounds(SqBoundsSync& sync, SqRefFinder& finder)
{
    mSqBoundsManager->syncBounds(sync, finder, mBoundsArray->begin(), getContextId());
}

It comes from mBoundsArray with an index. mBoundsArray->begin() will return the internal data array, but the array will be NULL if there's no elements and no empty-check is performed when using it. I'm wondering if this is the reason that cause the crash. Maybe my guessing is wrong as I do not read the whole codes and there may be some strategy to preventing mBoundsArray->begin() to be NULL witch I missed.