dotnet / corert

This repo contains CoreRT, an experimental .NET Core runtime optimized for AOT (ahead of time compilation) scenarios, with the accompanying compiler toolchain.
http://dot.net
MIT License
2.91k stars 508 forks source link

Pinvoked libraries somehow get broken in CoreRT #7341

Closed Alan-FGR closed 5 years ago

Alan-FGR commented 5 years ago

Hello there.

I'm trying to debug a pretty hairy behavior in CoreRT here. This isn't a proper issue because my time is limited and I'm struggling to make sense of what's happening, but I think it's better to use an issue instead of Gitter to log my findings and maybe get help.

I'm working on C# bindings for the PhysX library, and I have a testbed sample kinda thing which renders the debug geometry from the PhysX library, it's working fine on both .NET Framework (4.7.2) and .NET Core 2.1, and the code seems correct. However, I couldn't get that to work properly on CoreRT.

At first I thought it could be something in my pinvokes/wrappers, but after some investigation by debugging the output binary in VS, the stack traces point to the PhysX internals, most specifically some threading code (most of the time at least). At that point I've tried to make a minimal repro using only the PhysX libraries, but I simply couldn't get the problem to manifest. I then started from the other way and stripped stuff from the sample until I get something that works, that's when I noticed that for some reason the contact solving stuff in PhysX is broken, take a look at these lovely gifs :trollface:: Here's the proper expected result on CoreCLR: coreclr Here's what happens on CoreRT: corert

However, the thing is, even though the error happens in PhysX, when I remove the other libraries (I was trying to make a minimal SSCCE), the thing works, here's a minimal code that does the same thing except the other libraries were removed (it plots the cube height to console): https://github.com/Alan-FGR/CoreRT_SharpPhysX_Debug and it works fine on CoreRT, and here's a branch of SharpPhysX with some kinda minimal code that doesn't work: https://github.com/Alan-FGR/SharpPhysX/tree/corert_debug, note how the code that has to do with PhysX is the same (not 1:1 but the pinvoke calls are), the project in question is the "dotnetCore" the code is all in the DebugRenderer though because I thought it could be the abstract impls :P...

I've also tried to remove PhysX and have just the rendering (bgfx debug text and whatnot), and that works fine, the thing only crashes when I'm using all the pinvoked libs at the same time, what makes me think that somehow they're conflicting with each other, like maybe they're writing on each other memories or something, I don't really know... but that makes debugging very hard, especially since I'm a total n00b at debugging interop.

I'm going to try and make some better repros, but in case some of you want to give it a try I think that's all you need, the native PhysX is an opensource project too (https://github.com/NVIDIAGameWorks/PhysX/tree/4.0) and the native library (LibSharpPhysX) was built against the latest 4.0.

Alan-FGR commented 5 years ago

hey guys, so I'm now having consistent and sensible stack traces like this:

LibSharpPhysX.dll!physx::PxDefaultSimulationFilterShader(unsigned int attributes0, physx::PxFilterData filterData0, unsigned int attributes1, physx::PxFilterData filterData1, physx::PxFlags<enum physx::PxPairFlag::Enum,unsigned short> & pairFlags, const void * constantBlock, unsigned int constantBlockSize) Line 234
    at c:\projects\physx\physx\source\physxextensions\src\extdefaultsimulationfiltershader.cpp(234)
dotnetCore.exe!Flt2LngOvf()
dotnetCore.exe!Flt2LngOvf()
PhysX_64.dll!runFilterShapeSim(physx::PxFilterInfo & filterInfo, const physx::Sc::FilteringContext & context, const physx::Sc::ShapeSim & e0, const physx::Sc::ShapeSim & e1, const unsigned int fa0, const unsigned int fa1) Line 306
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(306)
PhysX_64.dll!filterRbCollisionPairSecondStage(const physx::Sc::FilteringContext & context, const physx::Sc::ShapeSim & s0, const physx::Sc::ShapeSim & s1, bool kine0, bool kine1, const unsigned int fa0, const unsigned int fa1) Line 357
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(357)
PhysX_64.dll!filterRbCollisionPair(const physx::Sc::FilteringContext & context, const physx::Sc::ShapeSim & s0, const physx::Sc::ShapeSim & s1) Line 550
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(550)
PhysX_64.dll!physx::Sc::NPhaseCore::runOverlapFilters(unsigned int nbToProcess, const physx::Bp::AABBOverlap * pairs, physx::PxFilterInfo * filterInfo, unsigned int & nbToKeep_, unsigned int & nbToSuppress_, unsigned int & nbToCallback_, unsigned int * keepMap, unsigned int * callbackMap) Line 574
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(574)
PhysX_64.dll!OverlapFilterTask::runInternal() Line 5745
    at c:\projects\physx\physx\source\simulationcontroller\src\scscene.cpp(5745)
PhysX_64.dll!physx::Cm::Task::run() Line 67
    at c:\projects\physx\physx\source\common\src\cmtask.h(67)
LibSharpPhysX.dll!physx::Ext::DefaultCpuDispatcher::runTask(physx::PxBaseTask & task) Line 97
    at c:\projects\physx\physx\source\physxextensions\src\extdefaultcpudispatcher.h(97)
LibSharpPhysX.dll!physx::Ext::CpuWorkerThread::execute() Line 97
    at c:\projects\physx\physx\source\physxextensions\src\extcpuworkerthread.cpp(97)
PhysXFoundation_64.dll!physx::shdfnd::`anonymous namespace'::PxThreadStart(void * arg) Line 101
    at c:\projects\physx\physx\source\foundation\src\windows\pswindowsthread.cpp(101)

it's something to do with filter data, it's a memory access violation error for one of the filters. so don't bother for now, I think I got this... i don't have much time to investigate but it's starting to look like it could be some error on my side that CoreRT won't forgive, I don't know...

Alan-FGR commented 5 years ago

nah, it's not consistent, sometimes filter errors are valid and i get this: ERROR: pFrame->m_savedThread was nullptr.

[Inline Frame] dotnetCore.exe!Thread::InlineReversePInvokeReturn(ReversePInvokeFrame *) Line 1232
    at e:\a\_work\600\s\corert_2607052\src\native\runtime\thread.cpp(1232)
dotnetCore.exe!RhpReversePInvokeReturn2(ReversePInvokeFrame * pFrame) Line 1397
    at e:\a\_work\600\s\corert_2607052\src\native\runtime\thread.cpp(1397)
dotnetCore.exe!Flt2LngOvf()
PhysX_64.dll!runFilterShapeSim(physx::PxFilterInfo & filterInfo, const physx::Sc::FilteringContext & context, const physx::Sc::ShapeSim & e0, const physx::Sc::ShapeSim & e1, const unsigned int fa0, const unsigned int fa1) Line 306
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(306)
PhysX_64.dll!filterRbCollisionPairSecondStage(const physx::Sc::FilteringContext & context, const physx::Sc::ShapeSim & s0, const physx::Sc::ShapeSim & s1, bool kine0, bool kine1, const unsigned int fa0, const unsigned int fa1) Line 357
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(357)
PhysX_64.dll!filterRbCollisionPair(const physx::Sc::FilteringContext & context, const physx::Sc::ShapeSim & s0, const physx::Sc::ShapeSim & s1) Line 550
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(550)
PhysX_64.dll!physx::Sc::NPhaseCore::runOverlapFilters(unsigned int nbToProcess, const physx::Bp::AABBOverlap * pairs, physx::PxFilterInfo * filterInfo, unsigned int & nbToKeep_, unsigned int & nbToSuppress_, unsigned int & nbToCallback_, unsigned int * keepMap, unsigned int * callbackMap) Line 574
    at c:\projects\physx\physx\source\simulationcontroller\src\scnphasecore.cpp(574)
PhysX_64.dll!OverlapFilterTask::runInternal() Line 5745
    at c:\projects\physx\physx\source\simulationcontroller\src\scscene.cpp(5745)
PhysX_64.dll!physx::Cm::Task::run() Line 67
    at c:\projects\physx\physx\source\common\src\cmtask.h(67)
LibSharpPhysX.dll!physx::Ext::DefaultCpuDispatcher::runTask(physx::PxBaseTask & task) Line 97
    at c:\projects\physx\physx\source\physxextensions\src\extdefaultcpudispatcher.h(97)
LibSharpPhysX.dll!physx::Ext::CpuWorkerThread::execute() Line 97
    at c:\projects\physx\physx\source\physxextensions\src\extcpuworkerthread.cpp(97)
PhysXFoundation_64.dll!physx::shdfnd::`anonymous namespace'::PxThreadStart(void * arg) Line 101
    at c:\projects\physx\physx\source\foundation\src\windows\pswindowsthread.cpp(101)

when they're valid they're 0 though, what makes me think that could be happening by chance (since zeroing mem is relatively common and zero objects are normally valid) so that still doesn't rule out that my code could be wrong, but coreclr/netfx they zero that so it's valid. Even when it's 0 the thing crashes though.

Alan-FGR commented 5 years ago

OK guys, here's some more stuff I figured: the filter shader thing is a function pointer, so to make sure it wasn't my stuff that was wrong I made my own impl like this:

PxFilterFlags SomeFilterShader(
PxFilterObjectAttributes attributes0, PxFilterData filterData0,
PxFilterObjectAttributes attributes1, PxFilterData filterData1,
PxPairFlags& pairFlags, const void* constantBlock, PxU32 constantBlockSize)
{
    printf("filter called\n");
    pairFlags = PxPairFlag::eCONTACT_DEFAULT;
    return PxFilterFlag::eDEFAULT;
}

and that works, when collisions happen that text is printed (so we can rule out wrong pointers and such), but on CoreRT it's printed once and then the errors with the stacks as above happen.

Alan-FGR commented 5 years ago

OK guys, I figured what's the problem, the problem is declaring the function pointer as a delegate (with UnmanagedFunctionPointer attribute). If I use an IntPtr then it works fine on CoreRT! :smiley:. I'm not sure whether that's a bug, maybe lacking impl for that attribute, or if that's just me not doing stuff correctly (as usual :trollface:).

Alan-FGR commented 5 years ago

I think the problem is that I assumed the UnmanagedFunctionPointer would actually turn a C# delegate into an actual function pointer what's not the case. The inconsistency is still there, but the code wasn't sound to begin with.