dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Allow changing the newline sequence for indenting using Utf8JsonWriter #54715

Open SteveKCheng opened 3 years ago

SteveKCheng commented 3 years ago

Background and Motivation

Currently the newline sequence used by Utf8JsonWriter depends on the running platform, obtained from Environment.NewLine.

It would be useful to be able to set it explicitly so that JSON output can be reproduced exactly no matter what platform the .NET program is running under. It is essential when the JSON output bytes are to be compared directly or through a hash, e.g. as a digital signature or for regression-testing expected outputs.

Proposed API

namespace System.Text.Json
{
    public struct JsonWriterOptions
    {
+        public string? NewLine { get; set; }
    }
}

The default value is null (backwards compatible), which means to take the value from Environment.NewLine. Otherwise the specified string is used as the newline sequence, as in the TextWriter.NewLine property.

Usage Examples


using var writer = new Utf8JsonWriter(outputStream, new JsonWriterOptions
{
    Indented = true,
    NewLine = "\n"
};

Alternatives

Currently as Utf8JsonWriter does not offer this functionality, I have to use Newtonsoft's JSON writer instead --- which relies on TextWriter that has a customizable NewLine property. It seems fairly difficult, and obviously inefficient, to write a "filter" on a Stream to convert the platform-specific newline byte sequence.

Risks

I think it is low risk for the most common cases. There is no change to behavior of existing code that does not set the JsonWriterOptions.NewLine property. Utf8JsonWriter's code is already parameterized on Environment.NewLine and this proposal is just asking we stop hard-coding the use of a global property.

What if the arbitrary newline sequence contains non-ASCII characters? Or it results in invalid JSON?

My first thought is that such characters should still be supported for completeness (otherwise we'd have to define what the behavior would be) even they might cause slightly slower execution. And it would be consistent with TextWriter's API with no friction for the user.

But perhaps an enumeration (or boolean?) allowing only for the standard choices of newline sequences would work too. Currently, Utf8JsonWriter's code assumes Environment.NewLine is either "\r\n" or "\n". Supporting the choice between these two would be good enough to satisfy the motivation above.

ghost commented 3 years ago

Tagging subscribers to this area: @eiriktsarpalis, @layomia See info in area-owners.md if you want to be subscribed.

Issue Details
## Background and Motivation Currently the newline sequence used by ``Utf8JsonWriter`` depends on the running platform, obtained from ``Environment.NewLine``. It would be useful to be able to set it explicitly so that JSON output can be reproduced exactly no matter what platform the .NET program is running under. It is essential when the JSON output bytes are to be compared directly or through a hash, e.g. as a digital signature or for regression-testing expected outputs. ## Proposed API ```diff namespace System.Text.Json { public struct JsonWriterOptions { + public string? NewLine { get; set; } } } ``` The default value is ``null`` (backwards compatible), which means to take the value from ``Environment.NewLine``. Otherwise the specified string is used as the newline sequence, as in the ``TextWriter.NewLine`` property. ## Usage Examples ``` C# using var writer = new Utf8JsonWriter(outputStream, new JsonWriterOptions { Indented = true, NewLine = "\n" }; ``` ## Alternatives Currently as ``Utf8JsonWriter`` does not offer this functionality, I have to use Newtonsoft's JSON writer instead --- which relies on ``TextWriter`` which has a customizable ``NewLine`` property. It seems fairly difficult, and obviously inefficient, to try to write a "filter" on a ``Stream`` to convert the platform-specific newline byte sequence. ## Risks I think it is low risk for the most common cases. There is no change to behavior of existing code that does not set the ``JsonWriterOptions.NewLine`` property. ``Utf8JsonWriter``'s code is already parameterized on ``Environment.NewLine`` and this proposal is just asking we stop hard-coding the use of a global property. What if the newline sequence contains non-ASCII characters? I think such characters should still be supported for completeness (otherwise we'd have to define what the behavior would be) even they might cause slightly slower execution. An enumeration could be used instead, allowing only for the standard choices of newline sequences, but an enumeration is rather noisy (I think in this case) and it would not be consistent with the existing API on ``TextWriter``.
Author: SteveKCheng
Assignees: -
Labels: `api-suggestion`, `area-System.Text.Json`, `untriaged`
Milestone: -
GrabYourPitchforks commented 3 years ago

I don't have any feedback on the customizing newline scenario - will leave that for the JSON team. But I wanted to focus on this comment:

It is essential when the JSON output bytes are to be compared directly or through a hash, e.g. as a digital signature or for regression-testing expected outputs.

Please be advised that different major versions of System.Text.Json are not guaranteed to have byte-for-byte identical output with one another. This could be because the .NET runtime itself changes how some values are encoded (like how .NET Framework and .NET 5 have different string representations for double), it could be because characters like '€' may or may not be escaped by default, or it could be some other factor.

If you rely on the contents having a very specific byte-for-byte representation, please always test each new version of System.Text.Json (or its dependencies) or each new version of the .NET runtime in an integration environment before deploying to production. This will allow you to catch such discrepancies early.

Hope this helps!

SteveKCheng commented 3 years ago

@GrabYourPitchforks Just want to comment: I am fine with the caveat that byte-for-byte comparison only works with the same version of the run-time & libraries. The use case I have right now is regression-testing by comparing expected outputs where some developers might run the program on Windows while the automated periodic process is running on Linux.

jeffhandley commented 2 years ago

@SteveKCheng We know that we aren't going to get to this feature during .NET 7, and compared to many of the other issues/features in our System.Text.Json backlog, I don't expect we'll get to this for the foreseeable future. To reflect this, I'm going to close the issue.

For the regression testing scenario you described, I recommend normalizing the newlines before comparison. As was previously called out, other direct comparisons or consumption for a hash, a digital signature, or other purpose would not be reliable without normalization either.

jeffhandley commented 2 years ago

Reopening based on additional feedback from @damieng that there are scenarios where additional control of the written content is valuable.

@damieng if you could, please add a comment citing your scenarios so that we can consider them and if we proceed on this use them as acceptance criteria.