dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.48k stars 4.76k forks source link

.Net9 RC1 Crash when creating Java Runtime through interop (and also .Net9 all previews) #107651

Closed fmaeseele closed 2 months ago

fmaeseele commented 2 months ago

Description

Calling CreateJVM function through interop crashes.

Reproduction Steps

See provided sample.

Expected behavior

Using interop to create a java runtime should not crash process.

Actual behavior

Using interop to create a java runtime crashes the process.

Regression?

Yes, comparing to .Net8

Known Workarounds

Using false in the .csproj fixed the issue.

Configuration

Windows 11 Professionnel 23H2 22631.4169

Other information

No response

fmaeseele commented 2 months ago

Use this project to reproduce the issue. Consider having JAVA_HOME environment variable defined so the sample can run.

TestCreateJVM.zip

filipnavara commented 2 months ago

Not saying that it's the root cause of the crash but you are using Marshal.UnsafeAddrOfPinnedArrayElement on unpinned array:

The array must be pinned using a GCHandle before it is passed to this method. For maximum performance, this method does not validate the array passed to it; this can result in unexpected behavior.

fmaeseele commented 2 months ago

Hi, Thank you for your comment. I could provide another sample using unsafe method instead. But the point is that it was perfectly working with .Net8 and just migrating to .Net9 make my software crashes. That's all this is about.

EgorBo commented 2 months ago

I wonder if JVM simply doesn't support CET processes https://github.com/dotnet/runtime/pull/103311 cc @mangod9 @jkotas

EgorBo commented 2 months ago

At least, the repro does work fine on net8.0 and crashed on net9.0, with this: image

mangod9 commented 2 months ago

Doesn't seem like it's CET related. Adding @AaronRobinsonMSFT to take an initial look on whether it's an interop issue?

fmaeseele commented 2 months ago

Hi, Still the same remark. I'm very found on migrating to .net9 for many reasons, but we can't because of this issue. .Net8 is working like a charm for our application. As we are a very big company in Europe, we simply can't migrate to .Net9 but this is gonna be a support issue as .Net8 is not a long term support.

EgorBo commented 2 months ago

A bit more simplified repro (with blittable structs): https://gist.github.com/EgorBo/4b67d51a5afa9e54668b4fc853875e2f

fmaeseele commented 2 months ago

bu thank you guys for taking care of this 👍

EgorBo commented 2 months ago

Doesn't seem like it's CET related. Adding @AaronRobinsonMSFT to take an initial look on whether it's an interop issue?

Once I disabled /CETCOMPAT flag for Corerun it has fixed the issue

fmaeseele commented 2 months ago

Doesn't seem like it's CET related. Adding @AaronRobinsonMSFT to take an initial look on whether it's an interop issue?

Once I disabled /CETCOMPAT flag for Corerun it has fixed the issue Even if this workaround works, is this something planned to be fixed to work like .Net8 ? Is there a simple way for me to test this ?

mangod9 commented 2 months ago

Ok interesting. @fmaeseele are you running with an apphost ? The workaround would be to disable using CETCompat =false in your csproj.

Adding @janvorli as fyi.

EgorBo commented 2 months ago

Ah, nice that we provide a runtime knob/msbuild property to disable it, the problem that it's not clear who and when should use it - how users are expected to debug such crashes and find out about the property?

fmaeseele commented 2 months ago

Ok interesting. @fmaeseele are you running with an apphost ? The workaround would be to disable using CETCompat =false in your csproj.

Adding @janvorli as fyi.

The project is the one provided, but to answer your question, unlike the provided sample, at the end you are right, this is an appHost project. Could you tell me how I can workaround this issue from you comment please ?

hamarb123 commented 2 months ago

Try adding <CETCompat>false</CETCompat> in your .csproj file, within the property group

fmaeseele commented 2 months ago

Try adding <CETCompat>false</CETCompat> in your .csproj file, within the property group

Working! But what is the impact of this property ?

Suchiman commented 2 months ago

It disables compatibility with Control Flow Enforcement Technology which is a security mitigation that protects against ROP (return oriented programming) hacking techniques that is enabled by default starting in .NET 9.

mangod9 commented 2 months ago

Ah, nice that we provide a runtime knob/msbuild property to disable it, the problem that it's not clear who and when should use it - how users are expected to debug such crashes and find out about the property?

we should get it documented appropriately. @fmaeseele, in 9 we have enabled CET (control-flow enforcement technology) which is enabled on recent x64 hardware (only on windows). It should be in compat mode so shouldn't cause breaks, so we will look into this. Perhaps Java isn't quite compliant.

fmaeseele commented 2 months ago

Ah, nice that we provide a runtime knob/msbuild property to disable it, the problem that it's not clear who and when should use it - how users are expected to debug such crashes and find out about the property?

we should get it documented appropriately. @fmaeseele, in 9 we have enabled CET (control-flow enforcement technology) which is enabled on recent x64 hardware (only on windows). It should be in compat mode so shouldn't cause breaks, so we will look into this. Perhaps Java isn't quite compliant.

I whish to be still able to run a Java runtime under .net9 (like in .net8) through interop unless it is clearly mentioned that I can't and why, and what would be the workaround. Thank you guys!

mangod9 commented 2 months ago

for now using the workaround should keep CET disabled for your scenario (which is same as what it was in 8)

fmaeseele commented 2 months ago

I updated the issue with the workaround but my question remains: is the workaround going to be a permanent fix and what am I missing using this workaround ? Or is it something that going to be fixed in .Net9 ?

sylveon commented 2 months ago

The most likely thing is that the workaround will stay in place until Java becomes compatible with CET

fmaeseele commented 2 months ago

From interop point of view, does someone could explain to me what is causing the issue ?

Suchiman commented 2 months ago

Enabling CET requires that all code in the process complies with the CET requirements and java is seemingly not compatible so the problem is not the interop code but the fact that as soon as java code runs, the CPU detects a CET violation, notifies the OS which then kills the process

fmaeseele commented 2 months ago

Enabling CET requires that all code in the process complies with the CET requirements and java is seemingly not compatible so the problem is not the interop code but the fact that as soon as java code runs, the CPU detects a CET violation, notifies the OS which then kills the process

Thank you for the explanation.

fmaeseele commented 2 months ago

Last question: is it something that going to be fixed in .Net9 final relase or not ?

mangod9 commented 2 months ago

Last question: is it something that going to be fixed in .Net9 final relase or not ?

we will investigate and update this issue. But since the workaround is easy it might stay this way

fmaeseele commented 2 months ago

Thanks to all you guys. But frankly, such issue is annoying. Sap (my compagny), won't buy the workaround and should ask for explanation on why migrating to .net9 is causing issue. Unless you guys specify this issue in the release notes.

janvorli commented 2 months ago

I've debugged the repro test locally. The issue is caused by JAVA calling SetThreadContext with Rip that's not on the shadow stack. That is not allowed when the application is CET enabled. While in non-strict mode (which is the default) Windows would fix invalid entries on shadow stack when executing dynamically generated code and code from shared libraries that are not compiled with /CETCOMPAT, setting thread context with Rip that's not somewhere on the shadow stack is not allowed and Windows always tears down the application. There is a comprehensive article describing these details here: https://techcommunity.microsoft.com/t5/windows-os-platform-blog/developer-guidance-for-hardware-enforced-stack-protection/ba-p/2163340

As for the testing app, down the call chain of the JNI_CreateJavaVM, it hits an access violation accessing address 0 in code that's dynamically generated by JAVA (guessing, since it is not in any known module and it is called from the jvm.dll). My guess is that it is a null reference. Since it doesn't crash when CET is disabled for the dotnet app, I assume this is somehow expected and handled in JAVA. When handling it, it decides to resume to an address that is not on the shadow stack and Windows tear down the application. I could see that before the crash below, the RtlRestoreContext was called from within the KiUserExceptionDispatch and it wanted to resume at 0000015181a503d6`. WinDbg was unable to show stack trace at the crash time since JAVA doesn't seem to provide Windows unwind info for its generated code, but dumping the shadow stack showed the actual stack nicely (this is from a run using my own build of .NET runtime 9 with debug symbols, obviously the JVM symbols other than the JNI_CreateJavaVM are garbage, as there is no PDB for the jvm.dll):

:000> dps @ssp
0000003d`72cfef18  00007fff`0c93404c ntdll!KiUserExceptionDispatch+0x3c
0000003d`72cfef20  00000151`81a503d4
0000003d`72cfef28  00007ffd`a4da28f6 jvm!JVM_RegisterSignal+0xed6
0000003d`72cfef30  00007ffd`a4e4952e jvm!JVM_RegisterSignal+0xa7b0e
0000003d`72cfef38  00007ffd`a4e4abf3 jvm!JVM_RegisterSignal+0xa91d3
0000003d`72cfef40  00007ffd`a4d18138 jvm!jio_vsnprintf+0x72208
0000003d`72cfef48  00007ffd`a4d4900a jvm!jio_vsnprintf+0xa30da
0000003d`72cfef50  00007ffd`a4c6c266 jvm!JNI_CreateJavaVM+0x86
0000003d`72cfef58  00007ffd`48e5279f
0000003d`72cfef60  00007ffd`48e52022
0000003d`72cfef68  00007ffd`a7c32fd3 coreclr!CallDescrWorkerInternal+0x83 [G:\git\runtime\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 74]
0000003d`72cfef70  00007ffd`a76d7570 coreclr!CallDescrWorkerWithHandler+0x130 [G:\git\runtime\src\coreclr\vm\callhelpers.cpp @ 63]
0000003d`72cfef78  00007ffd`a76d8113 coreclr!MethodDescCallSite::CallTargetWorker+0xb93 [G:\git\runtime\src\coreclr\vm\callhelpers.cpp @ 585]
0000003d`72cfef80  00007ffd`a724ef94 coreclr!MethodDescCallSite::Call+0x24 [G:\git\runtime\src\coreclr\vm\callhelpers.h @ 465]
0000003d`72cfef88  00007ffd`a72a0e07 coreclr!RunMainInternal+0x287 [G:\git\runtime\src\coreclr\vm\assembly.cpp @ 1202]
0000003d`72cfef90  00007ffd`a72a07ca coreclr!``RunMain'::`30'::__Body::Run'::`5'::__Body::Run+0x5a [G:\git\runtime\src\coreclr\vm\assembly.cpp @ 1274]

The 00000151`81a503d4 is the address of the crashing instruction

To answer @fmaeseele question - the whole reason for the workaround is that JAVA is not compatible with Intel CET. .NET 9 has enabled CET for the .NET apphost by default to enhance security. For processes with CET enabled, Windows doesn't support setting the current execution context to any location that is not already on the shadow stack (both in the non-strict and strict shadow stack modes). JVM does exactly that, hence Windows tear the process down. When you execute Java app via the java.exe, this issue doesn't occur, because the java.exe is not marked as compatible with CET and so Windows doesn't enable the shadow stack at all for it.

jkotas commented 2 months ago

@janvorli Could you please file a breaking change doc issue about the CET enablement (https://github.com/dotnet/docs) so that we have this compat issue officially documented?

The breaking change notice should talk about the general potential compat problem. It should not be JVM specific.

fmaeseele commented 2 months ago

I think the issue can be closed thanks to the detailled explanations and the workaround. Thank you all for your great work with .Net. Can I close it or do you need to have it still opened for breaking change doc label ?

jkotas commented 2 months ago

Breaking change doc issue: https://github.com/dotnet/docs/issues/42600

agocke commented 2 months ago

Closing as this has been documented as a known breaking change