dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.63k stars 3.15k forks source link

Stop escaping Unicode characters unnecessarily in relational JSON #32152

Open ajcvickers opened 10 months ago

ajcvickers commented 10 months ago

The JSON stored in a SQL Server column always has its Unicode characters escaped. For example, this code:

context.Add(new Customer { Json = new MyJson { Be = "Füzér Castle in the Zemplén Mountains" } } );
await context.SaveChangesAsync();

Results in the following JSON in the database:

{"Be":"F\u00FCz\u00E9r Castle in the Zempl\u00E9n Mountains"}

However, there is nothing preventing the JSON being stored as:

{"Be":"Füzér Castle in the Zemplén Mountains"}

EF reads the correct string back in both cases, but other tools may not be expecting escaped JSON for all Unicode characters.

Test code:

using (var context = new SomeDbContext())
{
    await context.Database.EnsureDeletedAsync();
    await context.Database.EnsureCreatedAsync();

    context.Add(new Customer { Json = new MyJson { Be = "Füzér Castle in the Zemplén Mountains" } });

    await context.SaveChangesAsync();
}

using (var context = new SomeDbContext())
{
    foreach (var c in context.Set<Customer>().ToList())
    {
        Console.WriteLine(c.Json.Be);
    }
}

public class SomeDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Data Source=(LocalDb)\MSSQLLocalDB;Database=AllTogetherNow")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Customer>().OwnsOne(e => e.Json).ToJson();
    }
}

public class Customer
{
    public int Id { get; set; }
    public MyJson Json { get; set; }
}

public class MyJson
{
    public string? Be { get; set; }
}
maumar commented 10 months ago

related: #30315

roji commented 10 months ago

Here's the relevant docs page on encoding/escaping in System.Text.Json; we may want to specify JavaScriptEncoder.UnsafeRelaxedJsonEscaping.

However, for people using non-Unicode encodings in the database, or wanting to interact with the JSON documents from other tools which may not support special chars, we should probably have some sort of mechanism for specifying the encoding/escaping policy; this basically boils down to allowing users to specify the JsonSerializerOptions (which has this one it).

Maybe it makes sense to keep the current behavior as the default (as that's the S.T.Json default), and just tell people to specify UnsafeRelaxedJsonEscaping via JsonSerializerOptions if they want unescaped chars.

maumar commented 4 months ago

related: https://github.com/dotnet/efcore/issues/33443

We also escape all unicode chars in JSON property names, however when building a JSON path for query or update we don't escape those chars. This leads to us not fetching/updating the requested JSON fragments

roji commented 3 months ago

Note that we'd need to not escape - and deal with non-escaped JSON data - everywhere in our stack (SaveChanges, JSON path generation, shaper - see #33771.

As in https://github.com/dotnet/efcore/pull/33771#pullrequestreview-2092452772, we should consider doing this for 9.0 as well, since #33771 is a breaking change for people querying externally-inputted JSON data that isn't escaped.

maumar commented 3 months ago

Note for the implementer: (as pointed out by @Charlieface here: https://github.com/dotnet/efcore/issues/33443#issuecomment-2140160687) when we allow un-escaped JSON strings, we need to escape single and double quotes and also pre-pend N to the JSON path string.

roji commented 3 months ago

Design discussion: