Open habbes opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-text-encodings-web See info in area-owners.md if you want to be subscribed.
Author: | habbes |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Text.Encodings.Web`, `untriaged` |
Milestone: | - |
If this feature request is approved, I would be happy to contribute the changes (with proper guidance).
@dotnet/area-system-text-encodings-web it looks like triage missed this. could we share whether this is something we'd consider? @GrabYourPitchforks may have an opinion here.
Largely a duplicate of https://github.com/dotnet/runtime/issues/54193.
But the HTML characters were still escaped. It seems like JavaScriptEncoder.Create is hardwired to forbid HTML characters even when the user explicitly allows them.
That is correct. The issue above (see also https://github.com/dotnet/runtime/issues/1564#issuecomment-504780719) goes into this in more detail. That's also why we have the "unsafe relaxed" encoder. We provide two switches: give me the safest thing, or give me something which closely aligns with the spec. We're not terribly interested in providing "mix-and-match" style functionality out of the box since these go down the path of very edge case scenarios.
In OP's example, the reason they want this functionality is that they want an absolute guarantee of \<some output sequence> given \<some input sequence>. But the encoders have never guaranteed output stability for any given input. We change the logic every release to follow best practice and to account for updates to the Unicode spec. Even in .NET Framework - which has extremely strict compat requirements! - we changed how characters were encoded every so often. The API OP wants us to expose would not address their issue of "we want this exact sequence of bytes to be emitted given this input." That's simply a guarantee we intentionally do not fulfill.
If you want to emit a specific sequence of bytes, you can subclass the encoder and override the appropriate members. The issue I linked above contains sample code showing how to do this. The difficulty given the current design is that many of the workhorse methods require you to write unsafe code.
I think it would be sensible to introduce a "simple" interface which allows you to take over the entire encoding process, something like:
public interface ITextEncoder
{
bool WillEncode(Rune value);
bool TryEncodeValue(Rune value, Span<char> destination, out int charsWritten);
}
(And in fact this is how the encoders were initially designed all those years back, before they were copied in to .NET Core 1.0.)
@GrabYourPitchforks thanks for following up and thanks for the detailed insights as well as the linked issues. Since the new writer we have implemented in our serializer requires users to explicitly opt-in to it, we have settled on making it clear in documentation that the byte-to-byte output and encoding might differ from the default. We'll also let users pass their preferred JavaScriptEncoder
to be used. In the extreme cause maybe, they can subclass JavaScriptEncoder
and override the escaping behaviour.
The API you have proposed looks good. Is it likely to make it in the standard library in the foreseeable future?
The API you have proposed looks good. Is it likely to make it in the standard library in the foreseeable future?
This isn't in consideration for .NET 7. @habbes Would you like to refine the original proposal here to align with @GrabYourPitchforks' proposal to see if this could get queued up for API Review?
Has there been any movement on this? I'm (unfortunately) working on migrating a large, old .NET Framework ASP.Net project to .NET 8 and we're having interminable problems with not having anything 100% equivalent to Json.Encode from Framework and no way to get there.
Background and motivation
The OData team is adopting
Utf8JsonWriter
to improve its JSON serialization performance. Currently it uses a custom-builtJsonWriter
. To minimize breaking changes and friction to our users, we would like the new writer to be compatible with the existing output as far as the serialized output is concerned. One incompatibility that has emerged is how the two writers handle string escaping:The OData writer by default escapes control chars (<
0x20
), non-ASCII chars (>0x7F
) and characters like", \, \n, \b, \f, \r, \t
.None of the built-in
JavaScriptEncoder
implementing matching escaping rules.The
JavaScriptEncoder.Default
escapes all the characters the OData writer escapes, but it also escapes HTML-sensitive characters like<
and>
which OData does not. It also escapes double quote using\u0022
where OData escapes it using a backslash:\"
.The
JavaScriptEncoder.UnsafeRelaxedJsonEscaping
does not escape HTML-sensitive characters, but it also does not escape non-ASCII characters (>0x7f
).I tried to create a custom
TextEncoderSettings
object to explicitly allow the characters that I do not want to be escaped. I explicitly allowed characters like<
and passed it toJavaScriptEncoder.Create(settings)
. But the HTML characters were still escaped. It seems likeJavaScriptEncoder.Create
is hardwired to forbid HTML characters even when the user explicitly allows them.JavaScriptEncoder.Create
calls the constructor of the internalDefaultJavaScriptEncoder(TextEncoderSettings settings, bool allowMinimalJsonEscaping)
. This creates anOptimizedInboxTextEncoder
with the option to forbid HTML characters depending on whetherallowMinimalJsonEscaping
is set totrue
orfalse
. ThisallowMinimalJsonEscaping
is set to false when creating an encoder with custom settings. And there does not seem to be any option for the user to enable it.It would be great if the user had the option to set
allowMinimalJsonEscaping
totrue
when callingJavaScriptEncoder.Create
, or any alternative that allows the bypassing the HTML escaping.API Proposal
API Usage
Alternative Designs
Alternatively, you can change the behaviour of
JavaScriptEncoder.Create(TextEncoderSettings)
such that it does not forbid HTML-sensitive characters. But this would be a breaking change.Risks
Allow HTML-sensitive characters presents the same risks as using the existing
JavaScriptEncoder.UnsafeRelaxedJsonEscaping
. Those risks are outlined in these docs. Our use-case is sending a JSON response whenapplication/json; charset = utf-8
header is set.AspNetCore also uses
JavaScriptEncoder.UnsafeRelaxedJsonEscaping
for JSON serialization by default.The new API would essentially be
JavaScriptEncoder.UnsafeRelaxedJsonEscaping
with a bit more control to escape additional characters.