OData / odata.net

ODataLib: Open Data Protocol - .NET Libraries and Frameworks
https://docs.microsoft.com/odata
Other
675 stars 348 forks source link

Set `ODataUtf8JsonWriter` as default `JsonWriter` #2980

Closed habbes closed 2 weeks ago

habbes commented 1 month ago

Issues

This pull request fixes #2822

Description

This PR:

Both of these changes aim to improve default serialization performance. ODataUtf8JsonWriter has demonstrated better performance and memory efficiency over the current default JsonWriter in benchmarks and in production workloads. One of the major blockers to it becoming the default was lack of support for streaming, but that has been addressed by #2880

The main concern about this change is that the JSON is different between the different writers as far as character escaping is concerned.

In any case, while the output is different, both are valid JSON and semantically equivalent. Compliant JSON parsers should interpret them the same way.

Differences between ODataUtf8JsonWriter and JsonWriter

Category Description Examples
String Escaping Utf8JsonWriter escapes more characters in a string than the default Jsonwriter. By default, Utf8JsonWriter escapes even HTML-unsafe characters like <. However, you can override that by using the UnsafeRelaxedJsonEscaping as documented here: https://learn.microsoft.com/en-us/odata/odatalib/using-ut8jsonwriter-for-better-performance#choosing-a-javascriptencoder. That said, even with this encoding, there will be some differences in character escaping, but all are valid UTF-8 encoded strings. Utf8JsonWriter uses uppercase letters for unicode code points, JsonWriter uses lowercase letters: (JsonWriter: "Cust1 \ud800\udc05 \u00e4" vs Utf8JsonWriter: "Cust1 \uD800\uDC05 \u00E4"). Encoder differences JsonWriter: "CityA1 'A1' + 3 <>" vs Utf8JsonWriter with relaxed encoder: "CityA1 'A1' + 3 <>" vs Utf8JsonWriter with default encoder: "CityA1 \u0027A1\u0027 \u002B 3 \u003C\u003E")
Number formatting Utf8JsonWriter serializes the decimal 1.0M as 1.0, but JsonWriter serializes it as 1 (the opposite of the previous scenario). JsonWriter: 1 vs Utf8JsonWriter: 1.0

We also used to have a difference in DateTimeOffset formatting because Utf8JsonWriter uses +00:00 timezone suffix when the timezone offset is 0 (e.g. 2022-11-09T09:42:30+00:00) whereas JsonWriter uses Z (e.g. 2022-11-09T09:42:30Z). However, we addressed this in a past PR because we got feedback that this one has higher chances of breaking clients (not following standards). This type of difference is also something customers have raised in the past. Now both JsonWriter and ODataUtf8JsonWriter use the Z suffix.

Justification for changing the default encoder to JavaScriptEncoder.UnsafeRelaxedJsonEscaping:

Main changes:

Checklist (Uncheck if it is not completed)

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.