Open Nemo-G opened 6 months ago
Add some extra info: Compile dotnet runtime under single thread. Build without SIMD enabled
Tagging subscribers to this area: @brzvlad, @kotlarmilos See info in area-owners.md if you want to be subscribed.
I'm going to provide more info here. I located the method to marshal and turned on MONO_VERBOSE_METHOD
.
And I hacked in interp_local_deadce
like
Here's the IL diff containing jiterp&bblocks ON vs OFF. https://www.diffchecker.com/4MXLTBTw/
It seems certain instructions was taken as dead but should be kept? And the mis-removal would cause problem in jiterp optimization?
In fact, with the hacking above. The program would work with only JITERP disabled
@BrzVlad I completely understand that you might be swamped. But would you take a look and provide some insights when you have time? Thanks in advance!
Thank you for taking the time to look into it. From the verbose log, it looks like the ldloca is generated with the incorrect source. The prefix1
instruction from inlined method is most likely a ldarga
. All roads point to this being a known issue: https://github.com/dotnet/runtime/pull/97650.
Are you sure you are using the latest net8 release ?
@BrzVlad Thanks for the inputs! We are indeed not working on the latest net8 but a fork with Unity specific changes. I will apply the patch and get you with the results later.
With the fix I can no longer see the BBLOCK optimization errors. Thanks again for the kind help! @BrzVlad
But still the program is not running correctly unless I disable INTERP_OPT_JITERPRETER. I got memory access OOB with callstack
I checked the method value in mono_runtime_invoke
and found the related method is
static class AutomaticWorldBootstrap
{
[RuntimeInitializeOnLoadMethod(RuntimeInitializeLoadType.BeforeSceneLoad)]
static void Initialize()
{
DefaultWorldInitialization.Initialize("Default World", false);
}
}
// attr definition
public enum RuntimeInitializeLoadType
{
AfterSceneLoad = 0,
BeforeSceneLoad,
AfterAssembliesLoaded,
BeforeSplashScreen,
SubsystemRegistration
}
[Scripting.RequiredByNativeCode]
[System.AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class RuntimeInitializeOnLoadMethodAttribute : Scripting.PreserveAttribute
{
public RuntimeInitializeOnLoadMethodAttribute() { this.loadType = RuntimeInitializeLoadType.AfterSceneLoad; }
public RuntimeInitializeOnLoadMethodAttribute(RuntimeInitializeLoadType loadType) { this.loadType = loadType; }
public RuntimeInitializeLoadType loadType
{
get { return m_LoadType; } private set { m_LoadType = value; }
}
RuntimeInitializeLoadType m_LoadType;
}
Although the stack evaluation failed in IsUserCattrProvider
, I can't make sure if it is the consequence or the root cause. What info can I provide regarding the jiterpreter logic in order to help further debug?
d62ff0ba2487e15b0d62ecadb7471019682b61d0 This is the only commit my branch doesn't have but I think it is not related.
And here is the mono verbose and jiterp stats log before MOOB
MONO_WASM: // jitted 471 bytes; 0 traces (0.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 2, fused: 1; back-branches emitted: 0, failed: 1 (0.0%);
MONO_WASM: // time: 0ms generating, 0ms compiling wasm.
MONO_WASM: // jitted 942 bytes; 1 traces (50.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 4, fused: 2; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 3ms generating, 1ms compiling wasm.
dotnet.native.8.0.0.pi5hm4ztm8.js:8 Input Manager initialize...
dotnet.native.8.0.0.pi5hm4ztm8.js:8 UnloadTime: 0.700000 ms
MONO_WASM: // jitted 1267 bytes; 2 traces (66.7%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 4, fused: 2; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 3ms generating, 3ms compiling wasm.
MONO_WASM: // jitted 1573 bytes; 3 traces (75.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 5, fused: 2; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 4ms generating, 4ms compiling wasm.
MONO_WASM: // jitted 1974 bytes; 4 traces (80.0%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 5, fused: 4; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 4ms generating, 5ms compiling wasm.
MONO_WASM: // jitted 2321 bytes; 5 traces (83.3%) (0 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 7, fused: 5; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 5ms generating, 7ms compiling wasm.
MONO_WASM: // jitted 2679 bytes; 6 traces (85.7%) (1 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 7, fused: 5; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 5ms generating, 8ms compiling wasm.
MONO_WASM: // jitted 3935 bytes; 7 traces (87.5%) (1 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 8, fused: 17; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 5ms generating, 9ms compiling wasm.
MONO_WASM: // jitted 4543 bytes; 8 traces (88.9%) (1 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 8, fused: 20; back-branches emitted: 0, failed: 2 (0.0%);
MONO_WASM: // time: 6ms generating, 10ms compiling wasm.
MONO_WASM: // jitted 5000 bytes; 9 traces (90.0%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 10, fused: 21; back-branches emitted: 0, failed: 3 (0.0%);
MONO_WASM: // time: 7ms generating, 11ms compiling wasm.
MONO_WASM: // jitted 5474 bytes; 10 traces (90.9%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 14, fused: 21; back-branches emitted: 0, failed: 3 (0.0%);
MONO_WASM: // time: 7ms generating, 13ms compiling wasm.
MONO_WASM: // jitted 6292 bytes; 11 traces (91.7%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 14, fused: 26; back-branches emitted: 0, failed: 3 (0.0%);
MONO_WASM: // time: 7ms generating, 14ms compiling wasm.
MONO_WASM: // jitted 6705 bytes; 12 traces (92.3%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 14, fused: 27; back-branches emitted: 1, failed: 3 (25.0%);
MONO_WASM: // time: 8ms generating, 15ms compiling wasm.
MONO_WASM: // jitted 7195 bytes; 13 traces (92.9%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 15, fused: 28; back-branches emitted: 2, failed: 4 (33.3%);
MONO_WASM: // time: 9ms generating, 16ms compiling wasm.
MONO_WASM: // jitted 7597 bytes; 14 traces (93.3%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 18, fused: 28; back-branches emitted: 2, failed: 4 (33.3%);
MONO_WASM: // time: 9ms generating, 18ms compiling wasm.
MONO_WASM: // jitted 7977 bytes; 15 traces (93.8%) (2 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 21, fused: 28; back-branches emitted: 2, failed: 4 (33.3%);
MONO_WASM: // time: 9ms generating, 19ms compiling wasm.
MONO_WASM: // jitted 9076 bytes; 16 traces (88.9%) (3 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 30, fused: 28; back-branches emitted: 2, failed: 4 (33.3%);
MONO_WASM: // time: 10ms generating, 20ms compiling wasm.
MONO_WASM: // jitted 11112 bytes; 17 traces (89.5%) (3 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 52, fused: 28; back-branches emitted: 2, failed: 4 (33.3%);
MONO_WASM: // time: 10ms generating, 21ms compiling wasm.
MONO_WASM: // jitted 11362 bytes; 18 traces (90.0%) (4 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 55, fused: 28; back-branches emitted: 2, failed: 4 (33.3%);
MONO_WASM: // time: 11ms generating, 22ms compiling wasm.
MONO_WASM: // jitted 11985 bytes; 19 traces (90.5%) (4 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 60, fused: 29; back-branches emitted: 3, failed: 4 (42.9%);
MONO_WASM: // time: 11ms generating, 24ms compiling wasm.
Method Unity.Entities.AutomaticWorldBootstrap:Initialize (), optimized 0, original code:
IL_0000: ldstr "Default World"
IL_0005: ldc.i4.0
IL_0006: call 0x0a0002a0
IL_000b: pop
IL_000c: ret
IL_0000 ldstr , sp 0,
IL_0005 ldc.i4.0 , sp 1, O String
IL_0006 call , sp 2, I4
IL_000b pop , sp 1, O World
IL_000c ret , sp 0,
Unoptimized IR:
BB0:
IL_ffffffff: tier_enter_method [nil <- nil],
BB1:
IL_0000: ldstr [0 <- nil], 0
IL_0005: ldc.i4.0 [1 <- nil],
IL_0006: call [2 <- c:], Unity.Entities.DefaultWorldInitialization:Initialize (string,bool)
IL_000c: ret.void [nil <- nil],
Runtime method: Unity.Entities.AutomaticWorldBootstrap:Initialize () 0x6984cc
Locals size 0
Calculated stack height: 2, stated height: 8
IR_0000: tier_enter_method [nil <- nil],
IR_0001: ldstr [0 <- nil], 0
IR_0004: ldc.i4.0 [8 <- nil],
IR_0006: call [0 <- 0], Unity.Entities.DefaultWorldInitialization:Initialize (string,bool)
IR_000a: ret.void [nil <- nil],
SEQ POINT MAP FOR Initialize:
MONO_WASM: // jitted 12518 bytes; 20 traces (90.9%) (4 rejected); 0 jit_calls; 0 interp_entries
MONO_WASM: // cknulls eliminated: 60, fused: 30; back-branches emitted: 4, failed: 6 (40.0%);
MONO_WASM: // time: 11ms generating, 25ms compiling wasm.
dotnet.native.8.0.0.pi5hm4ztm8.js:8 [Test] LoadingSystem.OnCreate invoked
dotnet.native.8.0.0.pi5hm4ztm8.js:8 RuntimeError: memory access out of bounds
There have been some serious changes in this area in .net9. It would be worthwhile to try it out. Maybe @kg can provide some help with diagnosing this latest failure
As a starting point, you could use MONO_VERBOSE_METHOD to dump IsUserCattrProvider. The jiterpreter should automatically engage its diagnostics for the method and we can examine what it's doing to see if it makes sense.
@BrzVlad Thanks for the suggestion. In fact we started our project investigation with net9 alpha. The only reason we decided to use net8 is that it is officially released at the time. When do you think that net9 can be considered stable? We will definitely come back to integrate with net9 by then.
localhost-1715442485731.log @kg Here's the log with IsUserCattrProvider enabled.
Is there a reason why it says dotnet.runtime.8.0.0.90q4mwuy84.js
? Are you not able to use the latest version of 8.0?
Will still investigate, but you may be missing bug fixes.
Also, are you running any third party WASM code in your environment? This crash appears to indicate memory corruption, specifically the reserved low memory region in the WASM heap that's supposed to be all zeroes. If the low memory region is corrupted when the jiterpreter is enabled, that can turn NullReferenceExceptions into 'memory access out of bounds' errors. I refer to this area as the "zero page".
If the zero page is being corrupted, you can disable zero page optimizations by turning off the jiterpreter-zero-page-optimization
option. See https://github.com/dotnet/runtime/blob/main/docs/design/mono/jiterpreter.md#configuring-the-jiterpreter for information on how to configure the jiterpreter.
You could also disable null check elimination, but in this case I don't think that's related.
If I found the correct implementation of IsUserCattrProvider
, this would potentially occur if the obj
passed into it is not a Type
. The first isinst will produce null, and the second isinst will be type-checking null (which is safe with zero page optimizations as long as the heap isn't corrupted).
You could also examine the bottom of WASM memory in the devtools debugger when this crash happens. It should be all zeroes.
@kg Yes there are third party natives, I'm working on Unity :) My team started to add a lot Unity specific changes on top of NET8 to make it work on WebGL platform without AOT (to keep a lower memory on mobile devices). In fact this issue comes from one of our clients. As we need a stable version and not sure if any recurssion issue may be introduced when applying the latest change of NET8, we just let the net8 stay on a specific version unless there are problems. And only the wasm changes are necessary for us.
I will try the jiterp options first and try to merge the latest NET8 later.
Confirmed that there is no MOOB when jiterpreter-zero-page-optimization
is turned off.
Confirmed that there is no MOOB when
jiterpreter-zero-page-optimization
is turned off.
For the interim it should be safe to just run with that option turned off, and the performance penalty is pretty small (something like 1-5% for jiterpreter traces that do specific operations).
For the interim it should be safe to just run with that option turned off, and the performance penalty is pretty small (something like 1-5% for jiterpreter traces that do specific operations).
Will try that temporarily for this specific project. Thanks for the help! But from your analysis, it seems there is an NPE which corrupts the 'zeroPage'. Instead of disabling the zero page opts, is there an 'official way' to solve this problem? I mean in what directions that I can try to analyze further? If there are other solutions, I definitely do not want to disable zero page opts by default for every clients.
I do not know of any good way to detect zero page corruption other than frequently verifying the zero page after invoking third party code. The jiterpreter does some basic checks of the zero page in order to identify this kind of corruption, and it will shut off the optimizations if it detects it. But it can't always detect it.
Also, I believe we pass --low-memory-unused
to wasm-opt
when optimizing the runtime, so the runtime itself may end up containing similar zero page optimizations in its C. I'm not sure when or how wasm-opt
exploits this property. I think clang may also have some optimizations of this sort in it. So you may potentially experience related crashes in other code.
You are correct. I just checked the build binlog and found that -O3 --low-memory-unused --zero-filled-memory...
were passed to wasm-opt
.
As for the 3rd party codes, except for Unity engine native modules, 3rd party native libs are allowed to be involved in the build process. If this crash can be triggered by any 3rd party native(wasm), it seems disabling the zero page optimization would be the only way to ensure the game won't crash.
Description
I was testing a project using
Marshal.GetFunctionPointerForDelegate
in both Debug and Release mode. The code fragment causing problem is:I first tested with wrapper func in Debug mode. It failed to find the native2managed function because
orig_method
is NULL in https://github.com/dotnet/runtime/blob/ca4f0fe37455882baa00c75b1ef30a7ff1494457/src/mono/mono/mini/interp/interp.c#L3327 (I can understand somehow because both m2n or n2m function in pinvoke-table.h was generated at compile time. Lambda function may not contain necessary info. But is it a bug? )So I removed the wrapper and marshal managedFn directly. I got no problem in Debug mode. However, I met assertion failure in Release mode: https://github.com/dotnet/runtime/blob/ca4f0fe37455882baa00c75b1ef30a7ff1494457/src/mono/mono/mini/interp/transform.c#L9007 (local_ref_count == 0 at the time) (Normally a fatal error won't give me exact mono trace so that I changed the related runtime code by setting
mono_error_set_xxx
to continue the program. This helped me to get the managed stacktraces that caused the problem.)And after that I did some tests by comparing what made the difference between Debug and Release. Eventually I located the interp optimization options because I found the code path in Debug and Release mode was totally different. With "-jiterp,-bblocks" disabled, everything seems fine.
As I'm not an expert in mono, I do not quite understand the optimization logic which makes the debugging full of struggle. Is there any documents regarding the interp optimization logic? Help is needed!
Reproduction Steps
Just by called the managed code(after mono runtime is initialized of course)
Expected behavior
No error
Actual behavior
Assertion failed and program was terminated
Regression?
I've only tested in .NET 8. Not sure if the behaviour is the same in .NET9
Known Workarounds
Disable part of interp_opts. The minimum combination is to disable INTERP_OPT_JITERPRETER and INTERP_OPT_BBLOCKS. It works fine in debug mode which I checked the source code that no interp_opts is enabled.
Configuration
.NET8 Wasm Tested in Chrome/Edge (v8) I haven't tested against Webkit, but I guess this may not be related to browser.
Other information
No response