Closed fmaeseele closed 2 months ago
Use this project to reproduce the issue. Consider having JAVA_HOME environment variable defined so the sample can run.
Not saying that it's the root cause of the crash but you are using Marshal.UnsafeAddrOfPinnedArrayElement
on unpinned array:
The array must be pinned using a GCHandle before it is passed to this method. For maximum performance, this method does not validate the array passed to it; this can result in unexpected behavior.
Hi, Thank you for your comment. I could provide another sample using unsafe method instead. But the point is that it was perfectly working with .Net8 and just migrating to .Net9 make my software crashes. That's all this is about.
I wonder if JVM simply doesn't support CET processes https://github.com/dotnet/runtime/pull/103311 cc @mangod9 @jkotas
At least, the repro does work fine on net8.0 and crashed on net9.0, with this:
Doesn't seem like it's CET related. Adding @AaronRobinsonMSFT to take an initial look on whether it's an interop issue?
Hi, Still the same remark. I'm very found on migrating to .net9 for many reasons, but we can't because of this issue. .Net8 is working like a charm for our application. As we are a very big company in Europe, we simply can't migrate to .Net9 but this is gonna be a support issue as .Net8 is not a long term support.
A bit more simplified repro (with blittable structs): https://gist.github.com/EgorBo/4b67d51a5afa9e54668b4fc853875e2f
bu thank you guys for taking care of this 👍
Doesn't seem like it's CET related. Adding @AaronRobinsonMSFT to take an initial look on whether it's an interop issue?
Once I disabled /CETCOMPAT flag for Corerun it has fixed the issue
Doesn't seem like it's CET related. Adding @AaronRobinsonMSFT to take an initial look on whether it's an interop issue?
Once I disabled /CETCOMPAT flag for Corerun it has fixed the issue Even if this workaround works, is this something planned to be fixed to work like .Net8 ? Is there a simple way for me to test this ?
Ok interesting. @fmaeseele are you running with an apphost ? The workaround would be to disable using CETCompat =false in your csproj.
Adding @janvorli as fyi.
Ah, nice that we provide a runtime knob/msbuild property to disable it, the problem that it's not clear who and when should use it - how users are expected to debug such crashes and find out about the property?
Ok interesting. @fmaeseele are you running with an apphost ? The workaround would be to disable using CETCompat =false in your csproj.
Adding @janvorli as fyi.
The project is the one provided, but to answer your question, unlike the provided sample, at the end you are right, this is an appHost project. Could you tell me how I can workaround this issue from you comment please ?
Try adding <CETCompat>false</CETCompat>
in your .csproj file, within the property group
Try adding
<CETCompat>false</CETCompat>
in your .csproj file, within the property group
Working! But what is the impact of this property ?
It disables compatibility with Control Flow Enforcement Technology which is a security mitigation that protects against ROP (return oriented programming) hacking techniques that is enabled by default starting in .NET 9.
Ah, nice that we provide a runtime knob/msbuild property to disable it, the problem that it's not clear who and when should use it - how users are expected to debug such crashes and find out about the property?
we should get it documented appropriately. @fmaeseele, in 9 we have enabled CET (control-flow enforcement technology) which is enabled on recent x64 hardware (only on windows). It should be in compat mode so shouldn't cause breaks, so we will look into this. Perhaps Java isn't quite compliant.
Ah, nice that we provide a runtime knob/msbuild property to disable it, the problem that it's not clear who and when should use it - how users are expected to debug such crashes and find out about the property?
we should get it documented appropriately. @fmaeseele, in 9 we have enabled CET (control-flow enforcement technology) which is enabled on recent x64 hardware (only on windows). It should be in compat mode so shouldn't cause breaks, so we will look into this. Perhaps Java isn't quite compliant.
I whish to be still able to run a Java runtime under .net9 (like in .net8) through interop unless it is clearly mentioned that I can't and why, and what would be the workaround. Thank you guys!
for now using the workaround should keep CET disabled for your scenario (which is same as what it was in 8)
I updated the issue with the workaround but my question remains: is the workaround going to be a permanent fix and what am I missing using this workaround ? Or is it something that going to be fixed in .Net9 ?
The most likely thing is that the workaround will stay in place until Java becomes compatible with CET
From interop point of view, does someone could explain to me what is causing the issue ?
Enabling CET requires that all code in the process complies with the CET requirements and java is seemingly not compatible so the problem is not the interop code but the fact that as soon as java code runs, the CPU detects a CET violation, notifies the OS which then kills the process
Enabling CET requires that all code in the process complies with the CET requirements and java is seemingly not compatible so the problem is not the interop code but the fact that as soon as java code runs, the CPU detects a CET violation, notifies the OS which then kills the process
Thank you for the explanation.
Last question: is it something that going to be fixed in .Net9 final relase or not ?
Last question: is it something that going to be fixed in .Net9 final relase or not ?
we will investigate and update this issue. But since the workaround is easy it might stay this way
Thanks to all you guys. But frankly, such issue is annoying. Sap (my compagny), won't buy the workaround and should ask for explanation on why migrating to .net9 is causing issue. Unless you guys specify this issue in the release notes.
I've debugged the repro test locally. The issue is caused by JAVA calling SetThreadContext
with Rip that's not on the shadow stack. That is not allowed when the application is CET enabled. While in non-strict mode (which is the default) Windows would fix invalid entries on shadow stack when executing dynamically generated code and code from shared libraries that are not compiled with /CETCOMPAT, setting thread context with Rip that's not somewhere on the shadow stack is not allowed and Windows always tears down the application.
There is a comprehensive article describing these details here: https://techcommunity.microsoft.com/t5/windows-os-platform-blog/developer-guidance-for-hardware-enforced-stack-protection/ba-p/2163340
As for the testing app, down the call chain of the JNI_CreateJavaVM, it hits an access violation accessing address 0 in code that's dynamically generated by JAVA (guessing, since it is not in any known module and it is called from the jvm.dll). My guess is that it is a null reference. Since it doesn't crash when CET is disabled for the dotnet app, I assume this is somehow expected and handled in JAVA.
When handling it, it decides to resume to an address that is not on the shadow stack and Windows tear down the application. I could see that before the crash below, the RtlRestoreContext
was called from within the KiUserExceptionDispatch
and it wanted to resume at 00000151
81a503d6`.
WinDbg was unable to show stack trace at the crash time since JAVA doesn't seem to provide Windows unwind info for its generated code, but dumping the shadow stack showed the actual stack nicely (this is from a run using my own build of .NET runtime 9 with debug symbols, obviously the JVM symbols other than the JNI_CreateJavaVM are garbage, as there is no PDB for the jvm.dll):
:000> dps @ssp
0000003d`72cfef18 00007fff`0c93404c ntdll!KiUserExceptionDispatch+0x3c
0000003d`72cfef20 00000151`81a503d4
0000003d`72cfef28 00007ffd`a4da28f6 jvm!JVM_RegisterSignal+0xed6
0000003d`72cfef30 00007ffd`a4e4952e jvm!JVM_RegisterSignal+0xa7b0e
0000003d`72cfef38 00007ffd`a4e4abf3 jvm!JVM_RegisterSignal+0xa91d3
0000003d`72cfef40 00007ffd`a4d18138 jvm!jio_vsnprintf+0x72208
0000003d`72cfef48 00007ffd`a4d4900a jvm!jio_vsnprintf+0xa30da
0000003d`72cfef50 00007ffd`a4c6c266 jvm!JNI_CreateJavaVM+0x86
0000003d`72cfef58 00007ffd`48e5279f
0000003d`72cfef60 00007ffd`48e52022
0000003d`72cfef68 00007ffd`a7c32fd3 coreclr!CallDescrWorkerInternal+0x83 [G:\git\runtime\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 74]
0000003d`72cfef70 00007ffd`a76d7570 coreclr!CallDescrWorkerWithHandler+0x130 [G:\git\runtime\src\coreclr\vm\callhelpers.cpp @ 63]
0000003d`72cfef78 00007ffd`a76d8113 coreclr!MethodDescCallSite::CallTargetWorker+0xb93 [G:\git\runtime\src\coreclr\vm\callhelpers.cpp @ 585]
0000003d`72cfef80 00007ffd`a724ef94 coreclr!MethodDescCallSite::Call+0x24 [G:\git\runtime\src\coreclr\vm\callhelpers.h @ 465]
0000003d`72cfef88 00007ffd`a72a0e07 coreclr!RunMainInternal+0x287 [G:\git\runtime\src\coreclr\vm\assembly.cpp @ 1202]
0000003d`72cfef90 00007ffd`a72a07ca coreclr!``RunMain'::`30'::__Body::Run'::`5'::__Body::Run+0x5a [G:\git\runtime\src\coreclr\vm\assembly.cpp @ 1274]
The 00000151`81a503d4 is the address of the crashing instruction
To answer @fmaeseele question - the whole reason for the workaround is that JAVA is not compatible with Intel CET. .NET 9 has enabled CET for the .NET apphost by default to enhance security. For processes with CET enabled, Windows doesn't support setting the current execution context to any location that is not already on the shadow stack (both in the non-strict and strict shadow stack modes). JVM does exactly that, hence Windows tear the process down. When you execute Java app via the java.exe, this issue doesn't occur, because the java.exe is not marked as compatible with CET and so Windows doesn't enable the shadow stack at all for it.
@janvorli Could you please file a breaking change doc issue about the CET enablement (https://github.com/dotnet/docs) so that we have this compat issue officially documented?
The breaking change notice should talk about the general potential compat problem. It should not be JVM specific.
I think the issue can be closed thanks to the detailled explanations and the workaround. Thank you all for your great work with .Net. Can I close it or do you need to have it still opened for breaking change doc label ?
Breaking change doc issue: https://github.com/dotnet/docs/issues/42600
Closing as this has been documented as a known breaking change
Description
Calling CreateJVM function through interop crashes.
Reproduction Steps
See provided sample.
Expected behavior
Using interop to create a java runtime should not crash process.
Actual behavior
Using interop to create a java runtime crashes the process.
Regression?
Yes, comparing to .Net8
Known Workarounds
Usingfalse in the .csproj fixed the issue.
Configuration
Windows 11 Professionnel 23H2 22631.4169
Other information
No response