dotnet / docs

This repository contains .NET Documentation.
https://learn.microsoft.com/dotnet
Creative Commons Attribution 4.0 International
4.24k stars 5.88k forks source link

System.Text.Json SourceGenerator should have better docs and public API #37370

Open piskov opened 1 year ago

piskov commented 1 year ago

(I apologize in advance if this is the wrong place for such kind of feedback. If so, would appreciate any advice where to put this)

Docs issue

The following two links are basically the only docs on JSON source generators:

The issue is that they describe one simple class consisting of basic .net types (e. g. integers or bools) that is nowhere near the real-world usage. Web APIs have a lot of DTOs (data transfer objects) which are used in multiple ways in controllers (get, put, etc.) and can be nested.

Here are just a handful of questions that are not answered in docs:

  1. Suppose one has dozens (or hundreds) DTOs in Web API. How one should add source gen for them? One context with dozens (hunderds) of attributes for each DTO? Separate contexts? Should I have source gen for all DTOs or only some?

    It would be great to see an example with at least 2-3 DTOs, not a single WeatherForecast. And guidelines/tips how to add source generation for the entire API with dozens (hundreds) of DTOs.

  2. What to do with nested DTOs. E. g. AddressDto inside PersonDto. How to configure source gen for that case? What would happen if nested DTO doesn’t have source gen but a parent does? What would happen if parent DTO has source gen, but nested doesn’t? What if I have a DTO that has 3 nested levels: CompanyDto that has EmployeeDto[] where each one has AddressDto, and source gen is configured only for CompanyDto and AddressDto?

  3. How one should handle collections of different types? What types of collections should be marked for source generation? Suppose I have these 3 cases:

    • nested IReadOnlyCollection<AddressDto> Addresses property inside PersonDto;
    • some controller to GET addresses that returns ActionResult<IEnumerable<AddressDto>>;
    • some controller to POST AddressDto.

    Should I add to source gen all three: IReadOnlyCollection<AddressDto>, IEnumerable<AddressDto>, AddressDto? Or if just IEnumerable<AddressDto> andAddressDto would suffice?

  4. What about controller action return type Task<ActionResult<IEnumerable<OrderDto>>>: what should be added in this case for source gen? What if the actual variable that is returned inside the controller is List<OrderDto> or HashSet<OrderDto> (while the controller’s action method signature is Task<ActionResult<IEnumerable<OrderDto>>>) = will the IEnumerable<OrderDto> mentioned in source-gen suffice?

  5. Is there any way to know if some cases are missed? E. g. if OrderDto is added, but IEnumerable<OrderDto> is not? To paraphrase, is there any way have some warnings during at least runtime(?) if reflection instead of source gen stuff have been used for (de)serialization in API controllers.

Overall UX thoughts

One would think who would not like to have free performance gains of using JSON source gen in Web APIs. Unfortunately current implementation feels like something one should wait for better times before implementing in their code. Something that was half-baked and rushed as a need for some specific case and that is not ready for actual API projects — too much manual boilerplate and not error-prone. Though I may be proven utterly wrong given more detailed docs/guides emerge.

One could ask for an option to apply JsonSerializable attributes directly to DTO classes instead of some separate context, or many other things like not having to think about collections (e. g. adding OrderDto would automatically handle cases for IEnumerable).

Of course, ideal case would be not to think about this at all. One possible solution is to have another source generator or T4 template (akin to Entity Framework DB first) that would automatically generate required “json source generator” boilerplate for all .cs files located inside project DTO folder. Is it really the intended way to manually write hundreds of [JsonSerializable(typeof(insert_every_dto_type_name_here_and_collection_permutation))]?


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

Issue Details
(I apologize in advance if this is the wrong place for such kind of feedback. If so, would appreciate any advice where to put this) ## Docs issue The following two links are basically the only docs on JSON source generators: - [How to use source generation in System.Text.Json](https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/source-generation) - [Try the new System.Text.Json source generator](https://devblogs.microsoft.com/dotnet/try-the-new-system-text-json-source-generator/) The issue is that they describe one simple class consisting of basic .net types (e. g. integers or bools) that is nowhere near the real-world usage. Web APIs have a lot of DTOs (data transfer objects) which are used in multiple ways in controllers (get, put, etc.) and can be nested. Here are just a handful of questions that are not answered in docs: 1. Suppose one has dozens (or hundreds) DTOs in Web API. How one should add source gen for them? One context with dozens (hunderds) of attributes for each DTO? Separate contexts? Should I have source gen for all DTOs or only some? It would be great to see an example with at least 2-3 DTOs, not a single `WeatherForecast`. And guidelines/tips how to add source generation for the entire API with dozens (hundreds) of DTOs. 2. What to do with nested DTOs. E. g. `AddressDto` inside `PersonDto`. How to configure source gen for that case? What would happen if nested DTO doesn’t have source gen but a parent does? What would happen if parent DTO has source gen, but nested doesn’t? What if I have a DTO that has 3 nested levels: `CompanyDto` that has `EmployeeDto[]` where each one has `AddressDto`, and source gen is configured only for `CompanyDto` and `AddressDto`? 3. How one should handle collections of different types? What types of collections should be marked for source generation? Suppose I have these 3 cases: - nested `IReadOnlyCollection Addresses` property inside `PersonDto`; - some controller to GET addresses that returns `ActionResult>`; - some controller to POST `AddressDto`. Should I add to source gen all three: `IReadOnlyCollection`, `IEnumerable`, `AddressDto`? Or if just `IEnumerable` and`AddressDto` would suffice? 4. What about controller action return type `Task>>`: what should be added in this case for source gen? What if the actual variable that is returned inside the controller is `List` or `HashSet` (while the controller’s action method signature is `Task>>`) = will the `IEnumerable` mentioned in source-gen suffice? 5. Is there any way to know if some cases are missed? E. g. if `OrderDto` is added, but `IEnumerable` is not? To paraphrase, is there any way have some warnings during at least runtime(?) if reflection instead of source gen stuff have been used for (de)serialization in API controllers. ## Overall UX thoughts One would think who would not like to have free performance gains of using JSON source gen in Web APIs. Unfortunately current implementation feels like something one should wait for better times before implementing in their code. Something that was half-baked and rushed as a need for some specific case and that is not ready for actual API projects — too much manual boilerplate and not error-prone. Though I may be proven utterly wrong given more detailed docs/guides emerge. One could ask for an option to apply `JsonSerializable` attributes directly to DTO classes instead of some separate context, or many other things like not having to think about collections (e. g. adding OrderDto would automatically handle cases for IEnumerable). Of course, ideal case would be not to think about this at all. One possible solution is to have another source generator or T4 template (akin to Entity Framework DB first) that would automatically generate required “son source generator” boilerplate for all .cs files located inside project DTO folder. Is it really the intended way to manually write hundreds of `[JsonSerializable(typeof(insert_every_dto_type_name_here_and_collection_permutation))]`?
Author: piskov
Assignees: -
Labels: `area-System.Text.Json`
Milestone: -
eiriktsarpalis commented 1 year ago

Hi @piskov, and thank you for the detailed feedback.

I can't speak for the docs team, but in my view the purpose of conceptual documentation is to provide an introduction to beginners rather than give the user a complete tour of all functionality and its permutations. As such, the examples offered are intentionally simplistic with the intention of exploring a specific feature in isolation. For comprehensive documentation targeted to more experienced users we tend to refer folks to our API documentation.

As regards to your specific questions, I'll try to answer to answer them here:

Suppose one has dozens (or hundreds) DTOs in Web API. How one should add source gen for them? One context with dozens (hunderds) of attributes for each DTO? Separate contexts?

Either approach works, ultimately it depends on how your structure your dependencies. Assuming you group all your DTOs in one project it's probably simpler to put everything in one context, YMMV.

It would be great to see an example with at least 2-3 DTOs, not a single WeatherForecast. And guidelines/tips how to add source generation for the entire API with dozens (hundreds) of DTOs.

The general assumption is that all features that we ship are additive -- in other words the same approach that works with 1 DTO works with 2-3 DTOs or hundreds of DTOs. Assuming there's a different approach, we will document it. If something breaks down with $n$ DTOs, please file an issue, it is most likely a product bug.

What to do with nested DTOs. E. g. AddressDto inside PersonDto. How to configure source gen for that case? What would happen if nested DTO doesn’t have source gen but a parent does? What would happen if parent DTO has source gen, but nested doesn’t? What if I have a DTO that has 3 nested levels: CompanyDto that has EmployeeDto[] where each one has AddressDto, and source gen is configured only for CompanyDto and AddressDto?

When you declare the WeatherForecast type as serializable, it has to generate the serialization contract for every value that it contains, namely int, string, DateTime and List<object>. By extension, the same concern applies to the example you are describing. Put more formally, declaring [JsonSerializable(typeof(MyType))] will generate the serialization contract for the entire transitive closure of MyType.

Should I add to source gen all three: IReadOnlyCollection, IEnumerable, AddressDto?

The short answer is yes, and it has to do with how the source generator works when compared to reflection. The upcoming release in .NET 8 somewhat relaxes that constraint but it is limited to serialization scenaria. You'd still need to register every type you want to deserialize.

Is there any way to know if some cases are missed? E. g. if OrderDto is added, but IEnumerable is not?

It is something not possible with the current model, but we're actively considering potential ways to improve this in future releases. cc @DamianEdwards and @eerhardt who are also interested in this problem.

One would think who would not like to have free performance gains of using JSON source gen in Web APIs. Unfortunately current implementation feels like something one should wait for better times before implementing in their code. Something that was half-baked and rushed as a need for some specific case and that is not ready for actual API projects — too much manual boilerplate and not error-prone. Though I may be proven utterly wrong given more detailed docs/guides emerge.

I should clarify that the primary benefit of using source generators is not performance per se, but rather it is enabling trimmed applications and Native AOT. This is explicitly a trade-off for the convenience and flexibility of using reflection in Core CLR. Actual performance benefits are marginal (with the notable exception of fast-path serialization) once you account for startup times.

Depending on what type of application you are trying to deploy, it may well be that source generation isn't the right approach for you. Please refer to this document for more details.

eiriktsarpalis commented 1 year ago

Related to https://github.com/dotnet/runtime/issues/91495

piskov commented 1 year ago

Thank you @eiriktsarpalis for providing some clarity — much appreciated.

primary benefit ‹…› is not performance per se, but rather it is enabling trimmed applications and Native AOT

If these are the only primary use-cases, it would be great news for me — currently I don’t care about either applications.

Actual performance benefits are marginal (with the notable exception of fast-path serialization) once you account for startup times.

In 2021 @layomia wrote about 40% RPS increase just on the source gen merit alone. That sounded like a considerable win and worth of using the generators (and that was the first year I was confused about boilerplate and the need of explicitly listing all permutations and distinct top-level DTOs).

With “marginal actual performance benefits” are you saying that reported 40% RPS gap in .NET 6 is much narrower in .NET 8? Or, more probably, that for more “ordinary” (as in not somewhat handpicked narrow use-case of returning mostly from in-memory cache) back to earth REST Web API, the performance difference is marginal and doesn’t matter? Then again, below that RPS data were some 2× reductions in memory footprint with source generators (utf-8 vs utf-16 stuff?)/

If marginal it is — again, great: I will leave this feature off my API’s bucket list :-)

the purpose of conceptual documentation is to provide an introduction to beginners

I wouldn’t call those docs conceptual or beginner level as there are a lot of useful performance/advanced level sections both in ASP and EF docs. Also if not there, where one should describe holistic approach instead of focusing on a single class or method :-)

My confusion for two years was due to combination of such “sourcery”-level constraints like needing to specify basically what seemed the same class multiple times, not knowing whether there should be one new context per type, silent fallback to reflection (I remember reading some release notes that you changed that later), handling cases of same type being mentioned twice in attributes, or being skipped or forgotten to be added by mistake, etc.

It’s not that some topics are advanced but that I really coudn’t tell how to apply this to my real-world API. And neither docs not some blog posts from independent “bloggers” via extensive web search helped (same one simple DTO example in every one of them).

I really felt myself inside the meme about drawing the rest of the goddamn owl missing all but the very first step of opening the IDE :-) The only case when it can be a simple two-step process is when it’s completely automated (source interceptors, t4 template parsing DTO folder, or something like that).

One more question remains, If I may:

Put more formally, declaring [JsonSerializable(typeof(MyType))] will generate the serialization contract for the entire transitive closure of MyType.

Suppose, I have PersonDto that has nested AddressDto property. I understand that if I have /GET PersonDto controller action I only need to add the source gen attribute for the PersonDto (no need to explicitly write second attribute for AddressDto as it is nested and will be generated automatically). But what if — in addition to that — I also have /GET AddressDto in the second controller. Will it still work with only PersonDto mentioned in attribute?

To put it more formally: yes, nested types can be ommited in explicit attributes, but can they still be ommited if the same type (Address) is both nested (in Person) and also used as a parent type by itself in some other controller (e. g. /POST Address)?

eiriktsarpalis commented 1 year ago

Actual performance benefits are marginal (with the notable exception of fast-path serialization) once you account for startup times.

In 2021 @layomia wrote about 40% RPS increase just on the source gen merit alone. That sounded like a considerable win and worth of using the generators (and that was the first year I was confused about boilerplate and the need of explicitly listing all permutations and distinct top-level DTOs).

I believe the particular techempower benchmark is using fast-path serialization only, which should justify the quoted performance improvement. It doesn't move the needle much if you use metadata-based serialization or deserialization in general. My wider point though is that the source generator does come with usability trade-offs, so depending on the complexity or publish target of your application, performance improvements alone might not suffice to justify using it.

But what if — in addition to that — I also have /GET AddressDto in the second controller. Will it still work with only PersonDto mentioned in attribute?

Yes. It has to work because it is required for the containing type to work.

fenryo237 commented 7 months ago

I just want to say thank you to @piskov for this discussion. I was litteraly feeling the same way. I am trying to replace NewtonSoft by System.Text.Json for a gaming dev and we have tons of nested object...

And thanks for the answers @eiriktsarpalis it helps

simon10says commented 5 months ago

I would also like to thank @piskov for this discussion and @eiriktsarpalis for your helpful replies. I'm also currently in this "oh no" situation where I've to refactor hundreds of DTO for Json source generator and am trying to strategize how to do it.

On the previous reply from @eiriktsarpalis :

Should I add to source gen all three: IReadOnlyCollection, IEnumerable, AddressDto?

The short answer is yes, and it has to do with how the source generator works when compared to reflection. The upcoming release in .NET 8 somewhat relaxes that constraint but it is limited to serialization scenaria. You'd still need to register every type you want to deserialize.

I'm wondering what would be the best approach to simplify it. I could:

  1. Painfully search all collection structures used and each add a [JsonSerializable(typeof('struct'<AddressDto>))]. This is painful because of the multiple combinations - IList, IList<AddressDto>, ICollection, List<AddressDto> , Collection<AddressDto> and etc.

  2. Add a [JsonSerializable(typeof(IList)]. Create a helper static Serialize() and static Deserialize() as follows:

    public static string Serialize<T>(T value)
    {
        if (value is IList valueList)
        {
            return JsonSerializer.Serialize(valueList, typeof(IList), MyJsonSerializerContext.Default);
        }
    
        return string.Empty;
    }

    And all collections that implement IList will be serialized using JsonTypeInfo<IList>