Closed radical closed 9 months ago
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.
Author: | radical |
---|---|
Assignees: | - |
Labels: | `area-System.Text.Json`, `test-enhancement` |
Milestone: | - |
S.T.J is also very close to the limit in the wasm interpreter configuration. If you raise the Jiterpreter limit by a few more megabytes the test suite will also hit V8's heap size limit and crash. So I think we've been just barely staying below this 2GB threshold for a while.
In order to understand the problem correctly, is this an issue of assembly size, number of tests, or specific tests having a very deep stack?
@eiriktsarpalis this is also a problem for Native AOT. The Json tests have been failing in the runtime-extra-platforms runs ever since they got enabled:
artifacts/bin/coreclr/linux.arm64.Release/build/Microsoft.NETCore.Native.targets(271,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command ""/__w/1/s/artifacts/bin/coreclr/linux.arm64.Release/x64/ilc/ilc" @"/__w/1/s/artifacts/obj/System.Text.Json.SourceGeneration.Roslyn4.4.Tests/Release/net8.0/native/System.Text.Json.SourceGeneration.Roslyn4.4.Tests.ilc.rsp"" exited with code 137
It's an OOM. The test does compile for me locally but takes forever, consumes gigabytes of memory, and compiles into a 150 MB executable with a 1.5 GB PDB. That's rather large.
There are three issues:
JsonConverter<T>
instances. Instances over value types are not shared and there's a lot of valuetypes being serialized. Some of them might not be intentional - for example ClassWithReadOnlyPropertyQueue
says it's a class
but it's in fact a struct
- there are dozens of these ClassWith
that are struct
s. Each unique JsonConverter<T>
instance over a struct costs more than 100 kB in size in the test. The 90% of this 100+ kB is caused by:CreateCastingConverter
generic method. This is a generic virtual method on a generic type. This patterns causes an N * M generic expansion in AOT. The test ends up calling the method with about 250 unique T's (we only count valuetypes). This means that each of the JsonConverter<T>
instances in the test get extra 250 specializations. There's hundreds of JsonConverter
instances in the app so this multiplies into unreasonable numbers (tens of thousands of CreateCastingConverter
method bodies) very quickly.Getting rid of the generic recursion would be probably the best first step. If we can turn some of the structs in the test into classes, that would help too. The generic virtual method is a huge problem because of the recursion but it will probably scale pretty poorly for large JSON models in general.
I have reasons to believe the recursion is still around handling of Queue<T>
because the warnings look like this:
ILC: AOT analysis warning IL3054: System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsyncEnumerable>d__4<Queue`1<Queue`1<Queue`1<Queue`1<SByte[]>>>>>: Generic expansion to 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1<Queue`1<Queue`1<Queue`1<Queue`1<Queue`1<SByte[]>>>>>>' was aborted due to generic recursion. An exception will be thrown at runtime if this codepath is ever reached. Generic recursion also negatively affects compilation speed and the size of the compilation output. It is advisable to remove the source of the generic recursion by restructuring the program around the source of recursion. The source of generic recursion might include: 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1', 'System.Text.Json.Serialization.JsonConverter`1', 'System.Text.Json.Serialization.JsonResumableConverter`1', 'System.Text.Json.Serialization.JsonDictionaryConverter`1', 'System.Text.Json.Serialization.JsonConverter.CreateCastingConverter<TTarget>()', 'System.Text.Json.Serialization.JsonConverterFactory.CreateCastingConverter<TTarget>()', 'System.Text.Json.Serialization.JsonConverter`1.CreateCastingConverter<TTarget>()', 'System.Text.Json.Serialization.Converters.CastingConverter`1', 'System.Text.Json.Serialization.Converters.JsonMetadataServicesConverter`1', 'System.Text.Json.Serialization.Metadata.DefaultJsonTypeInfoResolver.DeterminePropertyAccessors<T>(JsonPropertyInfo`1<T>,MemberInfo)', 'System.Text.Json.Serialization.Metadata.JsonPropertyInfo`1', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsync>d__1', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsyncEnumerable>d__4', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<DeserializeAsObjectAsync>d__6', 'System.Text.Json.Serialization.Metadata.JsonTypeInfo`1.<SerializeAsync>d__10', 'System.Text.Json.Serialization.Metadata.JsonParameterInfo`1'
If you want to see the extent of it, dotnet tool install sizoscope --global --framework net7.0-windows
, run sizoscope and open this file in the GUI:
System.Text.Json.SourceGeneration.Roslyn4.4.Tests.zip
You'll see something like this:
In order to understand the problem correctly, is this an issue of assembly size, number of tests, or specific tests having a very deep stack?
AFAIK stack depth is not the problem here, for the wasm interp scenario the native heap as a whole is growing until it hits a limit. If I had to guess it's probably a mix of type info, generated interpreter code, jitted wasm code (for the interp+jiterp scenario), etc. I don't know offhand how to get measurements out of the mono runtime on this stuff but if you need measurements I can probably figure it out. In the V8 console scenario, the native heap has to approach 2GB of allocated space before this happens. I'm not sure why the wasm AOT version is running out of memory, to be honest - I would expect it to use less memory, but it's possible that loaded AOT binaries count against V8's heap size limit.
There are tens of megabytes of JsonConverter
instances. Instances over value types are not shared and there's a lot of valuetypes being serialized. Some of them might not be intentional - for example ClassWithReadOnlyPropertyQueue says it's a class but it's in fact a struct - there are dozens of these ClassWith that are structs.
I think that's unavoidable given the nature of testing serializers, effectively we tend to define one new type per test and many of these tests need to include coverage for structs (to make sure that each facet of the serializer correctly accounts for struct semantics). I think we could try cutting a few struct definitions here and there when not strictly necessary from a test coverage perspective but I wouldn't expect huge gains from that exercise.
CreateCastingConverter generic method. This is a generic virtual method on a generic type. This patterns causes an N * M generic expansion in AOT.
I recently merged https://github.com/dotnet/runtime/pull/80755 that made changes to CreateCastingConverter
out of similar size concerns, and it appears that this method no longer needs to defined on JsonConverter<T>
. I'll follow-up with a PR.
@MichalStrehovsky new sizes after #87211 got merged.
With https://github.com/dotnet/runtime/pull/87276 it looks like the total size has dropped to 60 MB. Should we try re-enabling the tests once merged?
With #87276 it looks like the total size has dropped to 60 MB. Should we try re-enabling the tests once merged?
I have opened https://github.com/dotnet/runtime/pull/87256 which we can try once #87276 is merged.
PR is merged, feel free to rebase your PR branch.
With #87276 it looks like the total size has dropped to 60 MB. Should we try re-enabling the tests once merged?
It still fails:
[19:22:37] info: [STRT] System.Text.Json.Serialization.Tests.EnumConverterTests+UInt16EnumDictionary.SerilizeDictionaryWhenCacheIsFull
[19:22:43] info:
[19:22:43] info: <--- Last few GCs --->
[19:22:43] info:
[19:22:43] info: [92:0x563b327ef2f0] 89765 ms: Mark-Compact 2034.6 (2069.8) -> 2034.6 (2069.8) MB, 672.20 / 0.00 ms (average mu = 0.913, current mu = 0.006) allocation failure; scavenge might not succeed
[19:22:43] info: [92:0x563b327ef2f0] 90441 ms: Mark-Compact 2037.5 (2072.8) -> 2037.5 (2072.8) MB, 674.53 / 0.00 ms (average mu = 0.827, current mu = 0.002) allocation failure; scavenge might not succeed
[19:22:43] info:
[19:22:43] info:
[19:22:43] info: <--- JS stacktrace --->
[19:22:43] info:
[19:22:43] info:
[19:22:43] info: #
[19:22:43] info: # Fatal JavaScript out of memory: Reached heap limit
[19:22:43] info: #
[19:22:43] info:
[19:23:11] info: /home/helixbot/.jsvu/bin/v8: line 2: 92 Trace/breakpoint trap (core dumped) "/home/helixbot/.jsvu/engines/v8/v8" --snapshot_blob="/home/helixbot/.jsvu/engines/v8/snapshot_blob.bin" "$@"
Adding pre-test yield to few test classes will allow JS engine to run timer loops. It may give our GC chance to run.
xunit syntax is
public async Task InitializeAsync()
{
await Task.Yield();
}
But v8 timer loop is strange, still worth trying, I think.
cc @BrzVlad @vargaz
Did some experiments: Running with -s MAXIMUM_MEMORY=1073741824 (1GB) succeeds. Running with -s -s MAXIMUM_MEMORY=536870912 (512mb) fails in a controller manner:
[40m[32minfo[39m[22m[49m: [STRT] System.Text.Json.Serialization.Tests.StreamTests_AsyncWithSmallBuffer.VeryLargeJsonFileTest(payloadSize: 1000, ignoreNull: False, writeIndented: False)
[40m[32minfo[39m[22m[49m: console.debug: [MONO] GC_MINOR: (Nursery full) time 3.62ms, stw 3.62ms promoted 3302K major size: 41984K in use: 33033K los size: 62016K in use: 51968K
[40m[32minfo[39m[22m[49m: console.debug: [MONO] GC_MINOR: (Nursery full) time 3.92ms, stw 3.92ms promoted 3710K major size: 44608K in use: 36825K los size: 62016K in use: 51968K
[40m[32minfo[39m[22m[49m: console.debug: [MONO] GC_MINOR: (Nursery full) time 3.67ms, stw 3.67ms promoted 3709K major size: 48128K in use: 40615K los size: 62016K in use: 51968K
[41m[30mfail[39m[22m[49m: [FAIL] System.Text.Json.Serialization.Tests.StreamTests_AsyncWithSmallBuffer.VeryLargeJsonFileTest(payloadSize: 1000, ignoreNull: False, writeIndented: False)
[40m[32minfo[39m[22m[49m: System.OutOfMemoryException : Out of memory
[40m[32minfo[39m[22m[49m: at System.Text.Json.JsonReaderHelper.TranscodeHelper(ReadOnlySpan`1 )
[40m[32minfo[39m[22m[49m: at System.Text.Json.JsonSerializer.WriteString[List`1](List`1& , JsonTypeInfo`1 )
[40m[32minfo[39m[22m[49m: at System.Text.Json.Serialization.Tests.StreamTests.VeryLargeJsonFileTest(Int32 payloadSize, Boolean ignoreNull, Boolean writeIndented)
[40m[32minfo[39m[22m[49m: at System.Text.Json.JsonReaderHelper.TranscodeHelper(ReadOnlySpan`1 )
[40m[32minfo[39m[22m[49m: at System.Text.Json.JsonSerializer.WriteString[List`1](List`1& , JsonTypeInfo`1 )
So it looks like the linear memory usage is somewhere between 512mb and 1gb. Not sure what causes v8 to consume 2gb of memory.
I can reproduce this when running with v8 --max-old-space-size=2024
.
@pavelsavara @kg are there js objects we could be leaking/retaining?
Could this be heap growth? i.e. growing from 512mb to 1gb requires being able to allocate contiguously 512mb + 1gb in the v8 perm gen, and it's not big enough
@pavelsavara @kg are there js objects we could be leaking/retaining?
That unit test doesn't do any JS interop. The only one is that we hold Task/Promise pair for the main method.
I'm thinking about emscripten's posix emulation leaking. If there are VFS files created as part of the test, they would consume JS heap. What else ?
And I guess I should better understand how we allocate/free memory. I don't know what #ifdef
in major_free_swept_blocks
means yet.
A major GC block has 16kb and when we allocate blocks we allocate them in chunks (of 32 blocks at a time). Each block will be in a free list and will be used by the GC when needed. After a collection, in major_free_swept_blocks
we attempt to free blocks also in various chunk sizes, that is not identical to the original allocation. So we attempt to unmap in the middle of mapped memory, which I know doesn't work on Windows. Not sure how much benefit we get for this complication of avoiding to call into OS map/unmap.
To do more testing see https://github.com/dotnet/runtime/pull/88825#issuecomment-1634029173
It's unlikely we'll be able to address this in .NET 8, moving to future.
From https://github.com/dotnet/runtime/pull/88825#issuecomment-1634029173
It would be useful to get some GC log for those offending test suites. Env: MONO_LOG_LEVEL=debug and MONO_LOG_MASK=gc
cc @vitek-karas @artl93
closing as resolved but the large trim unfriendly test assemblies are still very much a thing
Running
System.Text.Json.Tests
with wasm/AOT crashes V8, and chrome. Issue: https://github.com/dotnet/runtime/issues/86164We run into limits while trying to run this test suite, and try to work around that in different ways. It would be extremely useful to possibly run the tests in parts. But I'm not sure how we could split this up. We could either split this into multiple projects if there is a good logical boundary for that. Or we could pick handful of big classes, and explicitly run them in a separate helix job.
cc @kg @pavelsavara for more details on the issues we are running into. cc @eiriktsarpalis