Closed srxqds closed 5 months ago
@lateralusX @BrzVlad can you help to check these problem?
This looks like when hot reload is enabled, the roots from the static fields of that class are not being registered at all to the GC. @lambdageek Do you understand what might be going on here ?
yeah, can you tell me where register static field to roots or scan static field
https://github.com/dotnet/runtime/blob/e12e2fa6cbdd1f4b0c8ad1b1e2d960a480c21703/src/mono/mono/metadata/object.c#L2231 is where we allocate memory for the static fields. If some of those fields are refs, we allocate the memory with mono_gc_alloc_fixed
which registers the memory as root, together with a gc descriptor for which slots are to be scanned. During gc, they are then scanned here https://github.com/dotnet/runtime/blob/e12e2fa6cbdd1f4b0c8ad1b1e2d960a480c21703/src/mono/mono/sgen/sgen-gc.c#L1288
I'm not familiar with the implications of hot reload over these static fields.
thank you @BrzVlad waiting for @lambdageek for help, or anyone can help me to reproduce it?
All you do is just enable or disable the hot reload component? You're not actually applying any changes?
What happens if you disable hot reload, but set the execution mode to mono_jit_set_aot_mode(MONO_AOT_MODE_INTERP_ONLY);
?
You're not actually applying any changes?
yes, I have not apply any change
What happens if you disable hot reload, but set the execution mode to mono_jit_set_aot_mode(MONO_AOT_MODE_INTERP_ONLY); ?
the problem also exists.
@lambdageek
anyone can take a look at this question
we have comment out this line https://github.com/dotnet/runtime/blob/0f6df43ff9bfb98f0fcac781125ce9b4e5b5ba66/src/mono/mono/metadata/assembly-load-context.c#L43
but the alc is not unloading.
and the second clue is the FTicker Internal_GetCoreTicker_Cache
object have been gc, I have found call the destructor function
it should introduce this problem.
but why it work on jit, but crash when interp enable.
@vargaz can you help me?
we found that call mono_gc_walk_heap
can get the same FTicker object after the destructor has been called.
the destructor call FTicker destructor 000001711BD677B8 //call mono_gc_walk_heap ObjectPtr:000001711BD677B8 Type:UnrealEngine.FTicker (target)
how can we fixed it, our project is blocking by this problem. anyone can help me, because the gc is diffcult to debug.
I have set gc_level=9, because the log file is too big for upload, I paste the log about the FTicker
object:
in jit mode:
2024-04-01 22:13:13 Allocated object 0000019369C01030, vtable: 000001931E396508 (FTicker), size: 40d
2024-04-01 22:13:16 Wbarrier store at 0000001DFAD79048 to 0000019369C01030 (FTicker)
2024-04-01 22:13:16 Adding remset at 0000001DFAD79048
2024-04-01 22:14:16 Added finalizer for object: 000001936B061D00 (Gen2GcCallback) (63) to old table
2024-04-01 22:14:16 Added finalizer for object: 0000019369C01030 (FTicker) (4) to nursery table
2024-04-01 22:14:16 Added finalizer for object: 0000019369C010D8 (DemoInputProcessor) (5) to nursery table
2024-04-01 22:14:16 Adding global remset for 0000019365C53498
2024-04-01 22:14:16 Precise copy of 0000019369C01030 from 000001936B0946A0
2024-04-01 22:14:16 Overwrote field at 000001936B0946A0 with 0000019302403600 (was: 0000019369C01030)
2024-04-01 22:14:16 Precise copy of 0000019369C14ED0 from 000001936B094BC0
2024-04-01 22:14:16 Precise copy of 0000019369C01030 from 0000001DFAD75B78
2024-04-01 22:14:16 (already forwarded to 0000019302403600)
2024-04-01 22:14:16 Promoting finalization of object 0000019302403600 (FTicker) (was at 0000019369C01030) to major table
2024-04-01 22:14:16 Precise scan of gray area post fin
in interp mode:
2024-04-01 22:18:08 Allocated object 000002D52D801030, vtable: 000002D56694DC88 (FTicker), size: 40d
2024-04-01 22:18:08 Allocated object 000002D52D801058, vtable: 000002D56694A320 (TDelegate__float_Ret_bool), size: 128d
2024-04-01 22:18:08 Wbarrier store at 000000E425B776A8 to 000002D52D801030 (FTicker)
2024-04-01 22:18:08 Adding remset at 000000E425B776A8
2024-04-01 22:18:26 Added finalizer for object: 000002D52ECDDD00 (Gen2GcCallback) (63) to old table
2024-04-01 22:18:26 Added finalizer for object: 000002D52D801030 (FTicker) (4) to nursery table
2024-04-01 22:18:26 Added finalizer for object: 000002D52D8010D8 (DemoInputProcessor) (5) to nursery table
2024-04-01 22:18:26 Precise copy of 000002D52D801030 from 000000E425B74838
2024-04-01 22:18:26 Queueing object for finalization: 000002D51E6CB790 (FTicker) (was at 000002D52D801030) (14)
2024-04-01 22:18:26 Precise copy of 000002D52D881360 from 000000E425B74838
// the new FTicker object has been Finalizing
2024-04-01 22:18:26 Wbarrier store at 000000E43C28F780 to 0000000000000000 (null)
2024-04-01 22:18:26 Finalizing object 000002D51E6CB790 (FTicker)
2024-04-01 22:18:26 Wbarrier store at 000002D52ED70418 to 0000000000000000 (null)
// below use old object address
2024-04-01 22:19:04 Wbarrier store at 000000E425B75AA8 to 000002D52D801030 (�`ls�)
2024-04-01 22:19:04 Adding remset at 000000E425B75AA8
2024-04-01 22:19:04 Wbarrier store at 000000E425B75AB0 to 000002D52DA1D5A0 (TDelegate__float_Ret_bool)
2024-04-01 22:19:04 Adding remset at 000000E425B75AB0
2024-04-01 22:19:32 Wbarrier store at 000000E425B75718 to 000002D52D801030 (�`ls�)
I have modified MINT_STSFLD_O
opcode with mono_gc_wbarrier_generic_store_internal
seems solve the problem:
but I have hit another problem: the generic method out parameter pass also have gc object error.
the second crash show below:
and the source code is :
Interesting. So it turns out we actually can allocate static field storage as part of another object when using hot reload. (https://github.com/dotnet/runtime/blob/main/src/mono/mono/component/hot_reload.c#L3092). The interpreter was not aware of this so it doesn't do write barriers when storing to static fields. My understanding is that you still encounter this crash when having hot reload enabled (not just by using mono interpreter) ?
If static field storage is part of another object then you are correct that we need write barriers when we store into it. MINT_STSFLD_O and also MINT_STSFLD_VT would need write barriers. For the latter you would need to either pass the field_klass to the opcode so you can use mono_value_copy_internal
or hack a call to sgen_card_table_mark_range
for the area that is being written.
Also, from the debugger screenshot, the bad object 0x1f4f1dee470 has its vtable 0x1f4f309df01. Because the vtable has its least significant bit 1, it is a sign that the value of the vtable is actually the location of the new object. This means that the object from the crash was promoted to location 0x1f4f309df00. You could investigate the type of object if it is something out of the ordinary.
In this type of crashes, it would be useful to run with MONO_GC_DEBUG=check-remset-consistency
. It should point to locations where there are missing cards.
thank you, I will continue to try tomorrow. yeah, we have enable hot reload component or apply any hot reload delta.
Interesting. So it turns out we actually can allocate static field storage as part of another object when using hot reload. (https://github.com/dotnet/runtime/blob/main/src/mono/mono/component/hot_reload.c#L3092). The interpreter was not aware of this so it doesn't do write barriers when storing to static fields. My understanding is that you still encounter this crash when having hot reload enabled (not just by using mono interpreter) ?
If static field storage is part of another object then you are correct that we need write barriers when we store into it. MINT_STSFLD_O and also MINT_STSFLD_VT would need write barriers. For the latter you would need to either pass the field_klass to the opcode so you can use
mono_value_copy_internal
or hack a call tosgen_card_table_mark_range
for the area that is being written.Also, from the debugger screenshot, the bad object 0x1f4f1dee470 has its vtable 0x1f4f309df01. Because the vtable has its least significant bit 1, it is a sign that the value of the vtable is actually the location of the new object. This means that the object from the crash was promoted to location 0x1f4f309df00. You could investigate the type of object if it is something out of the ordinary.
In this type of crashes, it would be useful to run with
MONO_GC_DEBUG=check-remset-consistency
. It should point to locations where there are missing cards.
when I set MONO_GC_DEBUG=check-remset-consistency
, there are many object not set in ImmutableArray
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D06778 at offset 16 in object 000001679F4AC370 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC388, vtable: 00000167AD7176F0 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D25860 at offset 16 in object 000001679F4AC388 (System.Collections.Immutable.ImmutableArray`1) not found in remsets, but object is pinned.
2024-04-03 09:37:38 Scanning object 000001679F4AC3A0, vtable: 00000167AD8388F0 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D31200 at offset 16 in object 000001679F4AC3A0 (System.Collections.Immutable.ImmutableArray`1) not found in remsets, but object is pinned.
2024-04-03 09:37:38 Scanning object 000001679F4AC3B8, vtable: 00000167AD834CB8 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D30A50 at offset 16 in object 000001679F4AC3B8 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC3D0, vtable: 00000167AD835E30 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D25E50 at offset 16 in object 000001679F4AC3D0 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC3E8, vtable: 00000167AD8A0230 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D2F3D0 at offset 16 in object 000001679F4AC3E8 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC400, vtable: 00000167AD9564F0 (ImmutableSegmentedHashSet`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D2F698 at offset 16 in object 000001679F4AC400 (Microsoft.CodeAnalysis.Collections.ImmutableSegmentedHashSet`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC418, vtable: 00000167AD9D0830 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D30618 at offset 16 in object 000001679F4AC418 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC430, vtable: 00000167AD9DF6A0 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D30758 at offset 16 in object 000001679F4AC430 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC448, vtable: 00000167C566C978 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D33BE0 at offset 16 in object 000001679F4AC448 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC460, vtable: 00000167ADB89E50 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D33C08 at offset 16 in object 000001679F4AC460 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC478, vtable: 00000167D0DA51C0 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D33D78 at offset 16 in object 000001679F4AC478 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC490, vtable: 00000167ADC18240 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D48B48 at offset 16 in object 000001679F4AC490 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4AC4A8, vtable: 00000167ADC2B050 (ImmutableArray`1)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795D48D70 at offset 16 in object 000001679F4AC4A8 (System.Collections.Immutable.ImmutableArray`1) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8130, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBA0A0 at offset 40 in object 000001679F4A8130 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8168, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBA628 at offset 40 in object 000001679F4A8168 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A81A0, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBA648 at offset 40 in object 000001679F4A81A0 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A81D8, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBA668 at offset 40 in object 000001679F4A81D8 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8210, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBA688 at offset 40 in object 000001679F4A8210 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8248, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBB0B8 at offset 40 in object 000001679F4A8248 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8280, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBB198 at offset 40 in object 000001679F4A8280 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A82B8, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBB278 at offset 40 in object 000001679F4A82B8 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A82F0, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBB358 at offset 40 in object 000001679F4A82F0 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8328, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBB438 at offset 40 in object 000001679F4A8328 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
2024-04-03 09:37:38 Scanning object 000001679F4A8360, vtable: 00000167ACEF6E18 (SyntaxTrivia)
2024-04-03 09:37:38 Oldspace->newspace reference 0000016795CBB520 at offset 40 in object 000001679F4A8360 (Microsoft.CodeAnalysis.SyntaxTrivia) not found in remsets.
so this set may not useful, and the 0x1f4f309df00
class is System.Object so it doesn't provide much useful information for me.
the ImmutableArray<T>
is struct so
public static class ImmutableArray
{
/// <summary>
/// A two element array useful for throwing exceptions the way LINQ does.
/// </summary>
internal static readonly byte[] TwoElementArray = new byte[2];
/// <summary>
/// Creates an empty <see cref="ImmutableArray{T}"/>.
/// </summary>
/// <typeparam name="T">The type of element stored in the array.</typeparam>
/// <returns>An empty array.</returns>
public static ImmutableArray<T> Create<T>()
{
return ImmutableArray<T>.Empty;
}
The logs support the existing theory that the issue is only with storing into static fields. Am I assuming correctly that you didn't tweak MINT_STSFLD_VT
to emit write barriers ? It could look something like this:
diff --git a/src/mono/mono/mini/interp/interp.c b/src/mono/mono/mini/interp/interp.c
index f8e2ad02ad9..7b46b27fbcd 100644
--- a/src/mono/mono/mini/interp/interp.c
+++ b/src/mono/mono/mini/interp/interp.c
@@ -6325,8 +6325,9 @@ MINT_IN_CASE(MINT_BRTRUE_I8_SP) ZEROP_SP(gint64, !=); MINT_IN_BREAK;
MINT_IN_CASE(MINT_STSFLD_VT) {
MonoVTable *vtable = (MonoVTable*) frame->imethod->data_items [ip [2]];
INIT_VTABLE (vtable);
+ MonoClass *klass = (MonoClass*) frame->imethod->data_items [ip [4]];
gpointer addr = frame->imethod->data_items [ip [3]];
- memcpy (addr, locals + ip [1], ip [4]);
+ mono_value_copy_internal (addr, locals + ip [1], klass);
ip += 5;
MINT_IN_BREAK;
}
diff --git a/src/mono/mono/mini/interp/transform.c b/src/mono/mono/mini/interp/transform.c
index bdf48b59f03..68e6bad19e1 100644
--- a/src/mono/mono/mini/interp/transform.c
+++ b/src/mono/mono/mini/interp/transform.c
@@ -4635,8 +4635,12 @@ interp_emit_sfld_access (TransformData *td, MonoClassField *field, MonoClass *fi
if (G_LIKELY (!wide_data)) {
td->last_ins->data [0] = (guint16) vtable_index;
td->last_ins->data [1] = (guint16) addr_index;
- if (mt == MINT_TYPE_VT)
- td->last_ins->data [2] = GINT_TO_UINT16 (size);
+ if (mt == MINT_TYPE_VT) {
+ if (is_load)
+ td->last_ins->data [2] = GINT_TO_UINT16 (size);
+ else
+ td->last_ins->data [2] = (guint16) get_data_item_index (td, field_class);
+ }
} else {
WRITE32_INS (td->last_ins, 0, &vtable_index);
WRITE32_INS (td->last_ins, 2, &addr_index);
ot found in remsets
this change works, thank you @BrzVlad
but but why won't happen when not enable collectiable alc?
Description
We are embeding mono in UnrealEngine, when we upgrade .ne8.0(v8.0.2), we found sgen-gc will not set static field back after object move when enable hotreloadcomponent.
Reproduction Steps
We have not separate code to reproduct it, but I will describe it as detailed as possible.
we have restore object in static field, when next pass it to internal call the object address after the gc have move it, the value not set back. I show the code first:
the internal call
the code logic execute as below :
I show the behaviors with hotreload component enable or not.
if we not enable hotreload component, just use jit execute mode, it work correctly:
the first call
AddTicker
, we can checkout theself
object:and we check the
Internal_GetCoreTicker_Cache
static field:the second call the
self
object address have changed:if we enable hotreload component, the static field will not write back: the first call:
the second call: the self object address have not been changed but the
vtable
is nullExpected behavior
work correctly
Actual behavior
crash
Regression?
.net7.0 work correctly
Known Workarounds
no
Configuration
windows build v8.0.2 tag set excute mode
mono_jit_set_aot_mode(MONO_AOT_MODE_INTERP_ONLY);
Other information
No response