dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

System.Text.Json's generator causes considerable package size increase #77897

Closed Sergio0694 closed 1 year ago

Sergio0694 commented 1 year ago

We're currently working on migrating all our JSON serialization from Newtonsoft.Json to System.Text.Json in the Microsoft Store, and we're hitting some issues with respect to binary size after enabling the source generators for all our types. Our data models for JSON responses the Store client handles from service are about 260, which once annotated over a context, transitively cause a very large number of types (ie. JsonTypeInfo<T> properties) to be generated, precisely 742 of them (this also includes eg. collection types with any of these data types as element type, etc.). That's a lot 😅

When trying this out, this caused our package size to regress from 59MB to about 76MB, so that's a 17MB (~29%) increase. This is for a final package with x86, x64 and Arm64 architectures, compiled fully AOT with .NET Native, with trimming enabled. For reference, with trimming disabled we baseline around 80MB without System.Text.Json, so I'd expect the version with System.Text.Json to be around the 100MB mark. We'll have to investigate whether the tradeoff for this size regression is worth the benefits of the source generators here (specifically for us, faster performance and less memory use when deserializing responses, and reliable behavior that's trimmer-safe), but I wanted to open this issue to investigate whether the generator can also be improved to reduce the metadata increase it causes.

For context: all these models are generated in metadata only mode, so the size increase is with just the metadata support code. No serialization fast-path is generated. One thing I noticed is, we have dozens and dozens of files like this, generated due to the transitive closure of the types we have annotated:

Generated code (click to expand):
```csharp // #nullable enable annotations #nullable disable warnings // Suppress warnings about [Obsolete] member usage in generated code. #pragma warning disable CS0618 namespace MyProject { internal sealed partial class MyJsonSerializerContext { private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? _ICollectionSomeModelType; /// /// Defines the source generated JSON serialization contract metadata for a given type. /// public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> ICollectionSomeModelType { get => _ICollectionSomeModelType ??= Create_ICollectionSomeModelType(Options); } // Intentionally not a static method because we create a delegate to it. Invoking delegates to instance // methods is almost as fast as virtual calls. Static methods need to go through a shuffle thunk. private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> Create_ICollectionSomeModelType(global::System.Text.Json.JsonSerializerOptions options) { global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? jsonTypeInfo = null; global::System.Text.Json.Serialization.JsonConverter? customConverter; if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection))) != null) { jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo>(options, customConverter); } else { global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues>() { ObjectCreator = null, NumberHandling = default, SerializeHandler = null }; jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo, global::SomeModelType>(options, info); } return jsonTypeInfo; } } } ```

In this case, the Create_ICollectionSomeModelType method is effectively the same as dozens of other files, just with a different type argument. One idea for the generator to produce less code is for it to identify all these cases where the "default" path is used (ie. there's no custom converter known at compile time for this type), and just emit a shared stub just once, that the various properties can then reuse. For instance, it could be something like this:

Generated code (click to expand):
```csharp private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> Create_ICollectionForType(global::System.Text.Json.JsonSerializerOptions options) { global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? jsonTypeInfo = null; global::System.Text.Json.Serialization.JsonConverter? customConverter; if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection))) != null) { jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo>(options, customConverter); } else { global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues>() { ObjectCreator = null, NumberHandling = default, SerializeHandler = null }; jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo, T>(options, info); } return jsonTypeInfo; } ```

And then the property would just do:

public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo<global::System.Collections.Generic.ICollection<global::SomeModelType>> ICollectionSomeModelType
{
    get => _ICollectionSomeModelType ??= Create_ICollectionForType<global::SomeModelType>(Options);
}

This trick can work for all "default" cases for several collection types as well (maybe for some concrete types as well, would have to check). Point is: the generator should identify all generated methods that have a high amount of shared code with others, and if possible rewrite them to be a single shared, generic method that all other consumers can invoke, instead of having their own copy.

This would likely give us very nice improvements in terms of code size, if you consider this multiplied over several dozens of types.

Let me know if we want a separate meta issue to track general size improvements for the generators, or if we just want to use this one as reference 🙂

cc. @eiriktsarpalis

Known Workarounds

Don't use the generators at all. Not desireable due to performance and trimming concerns.

Configuration

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

Issue Details
### Description We're currently working on migrating all our JSON serialization from Newtonsoft.Json to System.Text.Json in the Microsoft Store, and we're hitting some issues with respect to binary size after enabling the source generators for all our types. Our data models for JSON responses the Store client handles from service are about 260, which once annotated over a context, transitively cause a very large number of types (ie. `JsonTypeInfo` properties) to be generated, precisely 742 of them (this also includes eg. collection types with any of these data types as element type, etc.). That's a lot 😅 When trying this out, this caused our package size to regress from 59MB to about 76MB, so that's a 17MB (~29%) increase. This is for a final package with x86, x64 and Arm64 architectures, compiled fully AOT with .NET Native, with trimming enabled. For reference, with trimming disabled we baseline around 80MB without System.Text.Json, so I'd expect the version with System.Text.Json to be around the 100MB mark. We'll have to investigate whether the tradeoff for this size regression is worth the benefits of the source generators here (specifically for us, faster performance and less memory use when deserializing responses, and reliable behavior that's trimmer-safe), but I wanted to open this issue to investigate whether the generator can also be improved to reduce the metadata increase it causes. For context: all these models are generated in **metadata only mode**, so the size increase is with just the metadata support code. No serialization fast-path is generated. One thing I noticed is, we have dozens and dozens of files like this, generated due to the transitive closure of the types we have annotated:
Generated code (click to expand):
```csharp // #nullable enable annotations #nullable disable warnings // Suppress warnings about [Obsolete] member usage in generated code. #pragma warning disable CS0618 namespace MyProject { internal sealed partial class MyJsonSerializerContext { private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? _ICollectionSomeModelType; /// /// Defines the source generated JSON serialization contract metadata for a given type. /// public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> ICollectionSomeModelType { get => _ICollectionSomeModelType ??= Create_ICollectionSomeModelType(Options); } // Intentionally not a static method because we create a delegate to it. Invoking delegates to instance // methods is almost as fast as virtual calls. Static methods need to go through a shuffle thunk. private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> Create_ICollectionSomeModelType(global::System.Text.Json.JsonSerializerOptions options) { global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? jsonTypeInfo = null; global::System.Text.Json.Serialization.JsonConverter? customConverter; if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection))) != null) { jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo>(options, customConverter); } else { global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues>() { ObjectCreator = null, NumberHandling = default, SerializeHandler = null }; jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo, global::SomeModelType>(options, info); } return jsonTypeInfo; } } } ```
In this case, the `Create_ICollectionSomeModelType` method is effectively the same as dozens of other files, just with a different type argument. One idea for the generator to produce less code is for it to identify all these cases where the "default" path is used (ie. there's no custom converter known at compile time for this type), and just emit a shared stub just once, that the various properties can then reuse. For instance, it could be something like this:
Generated code (click to expand):
```csharp private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> Create_ICollectionForType(global::System.Text.Json.JsonSerializerOptions options) { global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? jsonTypeInfo = null; global::System.Text.Json.Serialization.JsonConverter? customConverter; if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection))) != null) { jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo>(options, customConverter); } else { global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues>() { ObjectCreator = null, NumberHandling = default, SerializeHandler = null }; jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo, T>(options, info); } return jsonTypeInfo; } ```
And then the property would just do: ```csharp public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> ICollectionSomeModelType { get => _ICollectionSomeModelType ??= Create_ICollectionForType(Options); } ``` This trick can work for all "default" cases for several collection types as well (maybe for some concrete types as well, would have to check). Point is: the generator should identify all generated methods that have a high amount of shared code with others, and if possible rewrite them to be a single shared, generic method that all other consumers can invoke, instead of having their own copy. This would likely give us very nice improvements in terms of code size, if you consider this multiplied over several dozens of types. Let me know if we want a separate meta issue to track general size improvements for the generators, or if we just want to use this one as reference 🙂 cc. @eiriktsarpalis ### Reproduction Steps n/a ### Expected behavior n/a ### Actual behavior n/a ### Regression? _No response_ ### Known Workarounds _No response_ ### Configuration - System.Text.Json 7.0.0-rc.2.22472.3 ### Other information _No response_
Author: Sergio0694
Assignees: -
Labels: `area-System.Text.Json`
Milestone: -
krwq commented 1 year ago

Assuming the Create_* methods are actually very close there is potential for optimization here. Is this project public by any chance? It would be really useful if we could get some test app on which we could test the size when we pick this up.

Sergio0694 commented 1 year ago

"Is this project public by any chance?"

It's the Microsoft Store, so it's unfortunately not public 😅 Happy to direct you to the repo though and help creating a repro if you ping me on Teams.

eiriktsarpalis commented 1 year ago

Increase in the static footprint of a source generated application is to be expected, to an extent. Effectively it is trading off application size with runtime code generation.

That being said, there certainly are opportunities to reduce the size of source generated code, e.g. by sharing metadata instantiation code between multiple types such as repeated generic type instantiations.

Sergio0694 commented 1 year ago

@eiriktsarpalis Yup exactly, we'd be fine with some increase, but it just feels like we can do a bit better at least 🙂 The way I see it there's at least two things that can greatly reduce the size increase:

krwq commented 1 year ago

We should definitely try to grab all of the low hanging fruits :-) Once we do that we can re-asses if we need/can do more.

eiriktsarpalis commented 1 year ago

Stop generating type info (and hierarchies) for types appearing in properties explicitly marked with [JsonIgnore]. This is something I don't really understand, like, is this a bug?

It is intentional. The idea is that ignored properties should still be accessible to users wishing to modify the JSON contract for a given type.

Sergio0694 commented 1 year ago

Ah, I see. That's unfortunate in our case, as it seems to be causing a pretty big number of types to be brought in even though they're not actually needed. I'll see if maybe there's a way we could modify our models to remove those properties entirely, but it would really be nice if there was also a way to just tell the generator to really skip a property. Like, random idea, a new ExcludeFromSourceGeneration named property on [JsonIgnore] that defaults to false, or something. As in, in cases where the extra flexibility afterwards is not needed, this just seems to cause a lot of metadata increase for no gain 🤔

EDIT: looking at our codebase, it seems this amounts to almost half of the generated code 😥

krwq commented 1 year ago

@eiriktsarpalis I think it makes sense to remove them completely, users can add them back if they need them, right?

eiriktsarpalis commented 1 year ago

We could, although it's likely some users will take a dependency on the .NET 7 behavior. We'd need to file a breaking change.

Sergio0694 commented 1 year ago

Having that be changed to be opt-in sounds like a great solution to me. I'd also be happy to help if you wanted to try this out with a nightly build from a PR, I could use it in the Store and see how much of an impact it'd have, so we can get an idea 🙂

krwq commented 1 year ago

cc: @SilentCC who volunteered to help with JsonIgnore investigation

Sergio0694 commented 1 year ago

Sharing another slightly different example we also just hit in the Store. Consider this:

[JsonSerializable(typeof(MyModel))]
public partial class MyContext : JsonSerializerContext
{
}

public class MyModel
{
    private List<string> _list = new();

    public IReadOnlyList<string> List => _list;
}

This will cause the generator to also generate type info metadata for List<string>, even though it's not needed.

This also makes me think: @eiriktsarpalis would it make sense to consider adding a new JsonSourceGenerationMode mode to support scenarios where you only want to serialize JSON models? Ie. we have several cases where we have models that we only ever need to serialize stuff, but never to deserialize, so that seems like a case where at least some amount of additional metadata could be ignored? This also includes eg. all the supporting delegates and helper functions to eg. create an instance of the type through a constructor, to set any of the properties, etc. That's quite a lot of supporting infrastructure that would seem not needed in all these cases? I'm not seeing a way to opt-out from that currently, as AFAIK "Serialization" is just a superset of "Metadata". What do you think, does this make sense? Happy to open a separate proposal for this if you think there's merit 🙂

eiriktsarpalis commented 1 year ago

I'm not seeing a way to opt-out from that currently, as AFAIK "Serialization" is just a superset of "Metadata".

JsonSourceGenerationMode.Serialization actually refers to fast-path serialization, so it is in fact generating code that is independent of metadata. A mode that generates write-only metadata would likely come with diminishing returns size-wise, as it would probably save you a few delegates here and there but not much. This can also be controlled by modelling your DTOs to be serialization-only (e.g. by removing public constructors or property setters).

Sergio0694 commented 1 year ago

"This can also be controlled by modelling your DTOs to be serialization-only (e.g. by removing public constructors or property setters)."

Possibly a dumb question, but how would one create instances of these models to serialize then 😅

I will also say I understand the point about diminishing returns though, I think maybe it's less noticeable now given that there's still the issue with too much generated code not being shared and also non-public members and ignored ones still being included in the generated metadata (ie. a mix of this issue, #77675 and #66679). Once those are addressed I agree it's possible that such a serialization-only mode wouldn't seem worth it anymore. Either way, it does seem like something we should only potentially reconsider again once these other issues are addressed, so yeah makes sense to shelve that for now 🙂

layomia commented 1 year ago

Bringing in this conversation from https://github.com/dotnet/runtime/issues/74492.


From earlier discussion from @agocke / @eiriktsarpalis / @davidfowl

OK, this makes sense then, thanks.

And that should be a fairly common scenario, given that each JsonSerializerContext generates metadata for the entire transitive closure of its type dependencies.

This is good to know. Given that code size is a metric we're tracking for Native AOT we should keep an eye on how this scales out with larger apps. No action for now, though.

From @layomia

@eiriktsarpalis @agocke @davidfowl it might be good to think about it now. In the first release of the source generator, we decided not to introduce a global cache for sharing/abstracting metadata between assemblies. I think we should revisit the decision. Some potential challenges with this approach which would be good to keep in mind are documented here.

My idea at the time (toward the end of .NET 6 dev time) was to use Assembly.GetCallingAssembly as a mechanism to partition the cache but @jkotas pointed out that it was slow and unreliable. His thought was that the solution requires new thinking (in .NET 7 or after) across the ASP.NET, serializer(s), source generators, and IL linker.

Wanted to share this now in case it sparks some ideas.

From @jkotas

The callsite rewriting gives you fast reliable equivalent of Assembly.GetCallingAssembly, but I do not see how it can help to solve the versioning problems described in the blog post.

From @layomia

Gotcha. If we have a design to reduce the size of source generated code (https://github.com/dotnet/runtime/issues/77897) with the primary mechanism being shared caches between assemblies, do you think it would be incomplete if it doesn't solve the versioning issues? I recognize this area needs careful evaluation of the trade-offs (guided by .NET 8 goals).

From @jkotas

I think large projects have high probability of running into versioning issues. It is very common for large projects to run into versioning issues with Newtonsoft.Json today. We do not want source generators turn into a new source of versioning problems that are impossible to solve.

If we are considering schemes that allow sharing of source generated serializers, I think we need to design it such that there is always an option to make explicit choice about when to share and when not share the source generated serializers.

From @layomia

👍

Sergio0694 commented 1 year ago

Just to add more context on this - at least for our use case in the Store, the size increase was not caused by duplicate generated code, as we only have a single JSON context (I mean, two, but one is very small and with separate types entirely).

The issues in our cases are that:

  1. We just have a high number of models in general (which gets even bigger with the transitive closure, but this is expected).
  2. The generated code has a lot of duplicate stubs that could ideally be shared (eg. for generic collection types).
  3. The generated code also emits code (and transitive closure) for [JsonIgnore] properties/fields (#76919 tracks ignored/private fields, not sure if there's an issue tracked ignored properties too, though we did discuss that here a bit as well).

This is not to say the global cache isn't useful - it sure seems to be! Just pointing out that size regressions caused by source generators aren't necessarily caused by duplication for the same types across different assemblies, is all 🙂

Also, as I mentioned in https://github.com/dotnet/runtime/issues/74492#issuecomment-1383230948, we're doing a bunch of work from our side at least to mitigate these issues, and we'll share updated size regression numbers as soon as we have a new working prototype with the new changes. I'm personally very curious to see if/how much the size regression will have been mitigated with just tweaks on the JSON model side, without yet having any new source generator improvements (which I'm still very much looking forwards too though ahah) 😄

layomia commented 1 year ago

Just to add more context on this - at least for our use case in the Store, the size increase was not caused by duplicate generated code, as we only have a single JSON context (I mean, two, but one is very small and with separate types entirely).

Thanks for pointing this out. I agree with your assessment. It would be awesome to see what the size diff would be with your prototype.

Nice to have all the related concerns in one spot. I'm just excited that a global cache is now a possibility given that there are many aspects of the generator's design that could have benefited from it earlier on.

ghost commented 1 year ago

Tagging subscribers to 'size-reduction': @eerhardt, @SamMonoRT, @marek-safar See info in area-owners.md if you want to be subscribed.

Issue Details
We're currently working on migrating all our JSON serialization from Newtonsoft.Json to System.Text.Json in the Microsoft Store, and we're hitting some issues with respect to binary size after enabling the source generators for all our types. Our data models for JSON responses the Store client handles from service are about 260, which once annotated over a context, transitively cause a very large number of types (ie. `JsonTypeInfo` properties) to be generated, precisely 742 of them (this also includes eg. collection types with any of these data types as element type, etc.). That's a lot 😅 When trying this out, this caused our package size to regress from 59MB to about 76MB, so that's a 17MB (~29%) increase. This is for a final package with x86, x64 and Arm64 architectures, compiled fully AOT with .NET Native, with trimming enabled. For reference, with trimming disabled we baseline around 80MB without System.Text.Json, so I'd expect the version with System.Text.Json to be around the 100MB mark. We'll have to investigate whether the tradeoff for this size regression is worth the benefits of the source generators here (specifically for us, faster performance and less memory use when deserializing responses, and reliable behavior that's trimmer-safe), but I wanted to open this issue to investigate whether the generator can also be improved to reduce the metadata increase it causes. For context: all these models are generated in **metadata only mode**, so the size increase is with just the metadata support code. No serialization fast-path is generated. One thing I noticed is, we have dozens and dozens of files like this, generated due to the transitive closure of the types we have annotated:
Generated code (click to expand):
```csharp // #nullable enable annotations #nullable disable warnings // Suppress warnings about [Obsolete] member usage in generated code. #pragma warning disable CS0618 namespace MyProject { internal sealed partial class MyJsonSerializerContext { private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? _ICollectionSomeModelType; /// /// Defines the source generated JSON serialization contract metadata for a given type. /// public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> ICollectionSomeModelType { get => _ICollectionSomeModelType ??= Create_ICollectionSomeModelType(Options); } // Intentionally not a static method because we create a delegate to it. Invoking delegates to instance // methods is almost as fast as virtual calls. Static methods need to go through a shuffle thunk. private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> Create_ICollectionSomeModelType(global::System.Text.Json.JsonSerializerOptions options) { global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? jsonTypeInfo = null; global::System.Text.Json.Serialization.JsonConverter? customConverter; if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection))) != null) { jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo>(options, customConverter); } else { global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues>() { ObjectCreator = null, NumberHandling = default, SerializeHandler = null }; jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo, global::SomeModelType>(options, info); } return jsonTypeInfo; } } } ```
In this case, the `Create_ICollectionSomeModelType` method is effectively the same as dozens of other files, just with a different type argument. One idea for the generator to produce less code is for it to identify all these cases where the "default" path is used (ie. there's no custom converter known at compile time for this type), and just emit a shared stub just once, that the various properties can then reuse. For instance, it could be something like this:
Generated code (click to expand):
```csharp private global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> Create_ICollectionForType(global::System.Text.Json.JsonSerializerOptions options) { global::System.Text.Json.Serialization.Metadata.JsonTypeInfo>? jsonTypeInfo = null; global::System.Text.Json.Serialization.JsonConverter? customConverter; if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(global::System.Collections.Generic.ICollection))) != null) { jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateValueInfo>(options, customConverter); } else { global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues> info = new global::System.Text.Json.Serialization.Metadata.JsonCollectionInfoValues>() { ObjectCreator = null, NumberHandling = default, SerializeHandler = null }; jsonTypeInfo = global::System.Text.Json.Serialization.Metadata.JsonMetadataServices.CreateICollectionInfo, T>(options, info); } return jsonTypeInfo; } ```
And then the property would just do: ```csharp public global::System.Text.Json.Serialization.Metadata.JsonTypeInfo> ICollectionSomeModelType { get => _ICollectionSomeModelType ??= Create_ICollectionForType(Options); } ``` This trick can work for all "default" cases for several collection types as well (maybe for some concrete types as well, would have to check). Point is: the generator should identify all generated methods that have a high amount of shared code with others, and if possible rewrite them to be a single shared, generic method that all other consumers can invoke, instead of having their own copy. This would likely give us very nice improvements in terms of code size, if you consider this multiplied over several dozens of types. Let me know if we want a separate meta issue to track general size improvements for the generators, or if we just want to use this one as reference 🙂 cc. @eiriktsarpalis ### Known Workarounds Don't use the generators at all. Not desireable due to performance and trimming concerns. ### Configuration - System.Text.Json 7.0.0-rc.2.22472.3
Author: Sergio0694
Assignees: -
Labels: `enhancement`, `area-System.Text.Json`, `size-reduction`, `source-generator`, `partner-impact`
Milestone: 8.0.0
Sergio0694 commented 1 year ago

Hey there! We're making good progress with the System.Text.Json migration in the Microsoft Store and we're finally ready to start testing out a full migration of our entire backend. We've done a bunch of changes to reduce the number of types to generated metadata for, to avoid unnecessary combinations of types (eg. we unified all of our collection types to just always use List<T> and Dictionary<TKey, TValue> everywhere in our models, to avoid cases where one would use some IEnumerable<T> and another model would use some ICollection<T> or another collection type), etc.

I re-run our pipeline to get updated size diffs (consider this is for x86 + x64 + Arm64):

Delta: +7.842 KB (+12.65%).

@eiriktsarpalis I'm happy to share a new minimal repro (internally) with our updated packages if anyone wants to take a look at the generated code to identify more optimization opportunities, and I'm also more than happy to try out any preview builds from the 8.0.0 branch and report back updated size diffs if you also wanted to use the Store as another reference point to track the impact of the new size saving work going into the next System.Text.Json release 🙂

SamMonoRT commented 1 year ago

/cc @lewing @kotlarmilos

Sergio0694 commented 1 year ago

Tested again after enabling trimming for all our JSON models (we had to previous keep metadata there due to Newtonsoft.Json, but that's no longer needed now that we're using the System.Text.Json source generators), updated results:

Delta: +6.291 KB (+10.15%).

So that's a nice improvement already, though still a noticeable regression when enabling the source generators.


FYI @krwq we have a total of 65 generated JsonTypeInfo<T> properties for List<T> instantiations, and those all pretty much have the same exact Create_XXX generated stub, with the only difference being the T used for the ObjectCreator property and the type argument used for JsonMetadataServices.CreateListInfo. Like we discussed previously, it does seem like we could have a single generic stub for all these properties, and then just use that for all 65 of them, which I think should result in a noticeable size increase (especially because 64 of those 64 instantiations have T being a reference type, so the same compiled code will be shared across all of them as well) 🙂

Basically this:

private JsonTypeInfo<List<T>> CreateListTypeInfo<T>(JsonSerializerOptions options, bool makeReadOnly)
{
    JsonTypeInfo<List<T>>? jsonTypeInfo = null;
    JsonConverter? customConverter;

    if (options.Converters.Count > 0 && (customConverter = GetRuntimeProvidedCustomConverter(options, typeof(List<T>))) != null)
    {
        jsonTypeInfo = JsonMetadataServices.CreateValueInfo<List<T>>(options, customConverter);
    }
    else
    {
        JsonCollectionInfoValues<List<T>> info = new JsonCollectionInfoValues<List<T>>()
        {
            ObjectCreator = () => new List<T>(),
            NumberHandling = default,
            SerializeHandler = null
        };

        jsonTypeInfo = JsonMetadataServices.CreateListInfo<List<T>, T>(options, info);

    }

    if (makeReadOnly)
    {
        jsonTypeInfo.MakeReadOnly();
    }

    return jsonTypeInfo;
}
eiriktsarpalis commented 1 year ago

@Sergio0694 I believe most of the immediate size concerns have been addressed right? I'm going to close this issue, but feel free to reopen if other action items still exist.

Sergio0694 commented 1 year ago

Yup, sounds good! Still a regression (which is by design) compared to no generators, but it's much better than before 🙂 I'll run a couple pipelines and leave an updated size diff here using the latest nightly build just for future reference. Thank you for all the help!! 🙌