Add snake_case support for System.Text.Json

hez2010 commented 5 years ago

Contents below were taken from @khellang, thanks!

API Proposal

Add additional casing support for System.Text.Json, based on this comment.

namespace System.Text.Json;

public partial class JsonNamingPolicy
{
    public static JsonNamingPolicy CamelCase { get; }
+    public static JsonNamingPolicy SnakeLowerCase { get; }
+    public static JsonNamingPolicy SnakeUpperCase { get; }
+    public static JsonNamingPolicy KebabLowerCase { get; }
+    public static JsonNamingPolicy KebabUpperCase { get; }
}

public enum JsonKnownNamingPolicy
{
    Unspecified = 0,
    CamelCase = 1,
+    SnakeLowerCase = 2,
+    SnakeUpperCase = 3,
+    KebabLowerCase = 4,
+    KebabUpperCase = 5,
}

Behavior Proposal

I propose the same behavior as Newtonsoft.Json, just like the existing camel case behavior. The implementation is here and the tests are here.

Other Comments

I think snake case naming is pretty common, especially in the Ruby world, and should be supported out of the box. GitHub's API is probably the most popular example using this naming convention off the top of my head.

Implements

See: dotnet/corefx#40003

Gnbrkm41 commented 5 years ago

If you have any designs in mind, could you update the top post with what the proposed APIs would look like / how commonly used snake case is / etc? Here's what I think is a good example: https://github.com/dotnet/corefx/issues/28933#issue-312415055. This should help people better understand the problem/importance of this issue, and may come up with improvements.

khellang commented 5 years ago

API Proposal

namespace System.Text.Json
{
    public abstract class JsonNamingPolicy
    {
        protected JsonNamingPolicy() { }
        public static JsonNamingPolicy CamelCase { get { throw null; } }
+       public static JsonNamingPolicy SnakeCase { get { throw null; } }
        internal static JsonNamingPolicy Default { get { throw null; } }
        public abstract string ConvertName(string name);
    }

+   internal sealed class JsonSnakeCaseNamingPolicy : JsonNamingPolicy
+   {
+       public override string ConvertName(string name) { throw null; }
+   }
}

Behavior Proposal

I propose the same behavior as Newtonsoft.Json, just like the existing camel case behavior. The implementation is here and the tests are here.

Other Comments

I think snake case naming is pretty common, especially in the Ruby world, and should be supported out of the box. GitHub's API is probably the most popular example using this naming convention off the top of my head.

UPDATE: Looks like dotnet/corefx#40003 implemented more or less this exact proposal.

hez2010 commented 5 years ago

@khellang I've pasted your proposal contents to the top post, thanks!

xsoheilalizadeh commented 5 years ago

I've prepared a benchmark that compares other implementation with dotnet/corefx#40003 PR.


BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview8-013656
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT

|                          Method |        Mean |      Error |     StdDev | Rank |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------------------- |------------:|-----------:|-----------:|-----:|-------:|------:|------:|----------:|
|  ToSnakeCaseStringBuilderBySpan |    85.97 ns |  1.0500 ns |  0.9821 ns |    1 | 0.0637 |     - |     - |     200 B |
| ToSnakeCaseNewtonsoftJsonBySpan |    86.35 ns |  1.7816 ns |  1.4877 ns |    1 | 0.0484 |     - |     - |     152 B |
|       ToSnakeCaseNewtonsoftJson |    88.89 ns |  0.8417 ns |  0.7461 ns |    2 | 0.0484 |     - |     - |     152 B |
|               ToSnakeCaseBySpan |   122.01 ns |  1.9066 ns |  1.6902 ns |    3 | 0.0253 |     - |     - |      80 B |
|                 ToSnakeCaseLinq |   365.75 ns |  4.4839 ns |  3.5008 ns |    4 | 0.1450 |     - |     - |     456 B |
|                ToSnakeCaseRegex | 2,108.71 ns | 40.6829 ns | 31.7625 ns |    5 |

xsoheilalizadeh commented 5 years ago

I investigated a couple of weeks about this case and tried other approaches + dotnet/corefx#40003 that have better performance to having snake case naming policy. Finally, I got the best result and it have less allocation and fast execution. My last implementation is ToSnakeCaseBySpan . Shoud I open a new PR for this?

|                          Method |        Mean |      Error |     StdDev | Rank |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------------------- |------------:|-----------:|-----------:|-----:|-------:|------:|------:|----------:|
|               ToSnakeCaseBySpan |    27.47 ns |  0.4629 ns |  0.4330 ns |    1 | 0.0153 |     - |     - |      48 B |
|  ToSnakeCaseStringBuilderBySpan |    85.23 ns |  1.6495 ns |  1.3774 ns |    2 | 0.0637 |     - |     - |     200 B |
|       ToSnakeCaseNewtonsoftJson |    85.72 ns |  1.6418 ns |  1.4554 ns |    2 | 0.0484 |     - |     - |     152 B |
| ToSnakeCaseNewtonsoftJsonBySpan |    86.96 ns |  1.7060 ns |  1.5958 ns |    2 | 0.0484 |     - |     - |     152 B |
|                 ToSnakeCaseLinq |   353.42 ns |  3.9670 ns |  3.7108 ns |    3 | 0.1450 |     - |     - |     456 B |
|                ToSnakeCaseRegex | 2,056.69 ns | 29.5694 ns | 26.2125 ns |    4 | 0.1526 |     - |     - |     496 B |

\cc @karelz

hez2010 commented 5 years ago

@xsoheilalizadeh I think the name of a property is usually not too long, so maybe you can use stackalloc to reduce memory allocation on heap to reduce GC pressure? (It may achieve zero allocation)

public unsafe ReadOnlySpan<char> ToSnakeCaseBySpan()
{
    ...
    Span<char> buffer = stackalloc char[bufferSize];
    ...
}

xsoheilalizadeh commented 5 years ago

Nice suggestion, but here is a question how I should return buffer, I've got following:

Cannot use local 'buffer' in this context because it may expose referenced variables outside of their declaration scope

Gnbrkm41 commented 5 years ago

Should I open a new PR for this?

Nope, not yet 😅 not until the API is approved.

hez2010 commented 5 years ago

Because buffer now is allocated on the stack, you cannot return it directly since it will be disposed after return. To solve it, you can simply change return buffer to return buffer.ToString();

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10

From: Soheil Alizadeh notifications@github.com Sent: Friday, September 13, 2019 3:48:38 PM To: dotnet/corefx corefx@noreply.github.com Cc: Steve hez2010@outlook.com; Author author@noreply.github.com Subject: Re: [dotnet/corefx] Add UnderScoreCase support for System.Text.Json (#39564)

Nice suggestion, but here is a question how I should return buffer, I've got following:

Cannot use local 'buffer' in this context because it may expose referenced variables outside of their declaration scope

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F39564%3Femail_source%3Dnotifications%26email_token%3DADSENWIDJVGBMFT3T4V3U5LQJNAVNA5CNFSM4IEL7W4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UIBGI%23issuecomment-531136665&data=02%7C01%7C%7C68574ff9a79b43fbaf2a08d7381ecaaf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637039577204910801&sdata=PhAc8%2BvEwr9AHgXI1vNdlLENvuKw1jDC6x%2F%2FNuc7PDk%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADSENWIHQJ7DC64RZH2RQFDQJNAVNANCNFSM4IEL7W4A&data=02%7C01%7C%7C68574ff9a79b43fbaf2a08d7381ecaaf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637039577204920805&sdata=M9kuqi6W4P3Msx76loMm8q2laC%2FrNgb4ZwIozAwcDZ4%3D&reserved=0.

karelz commented 5 years ago

@ahsonkhan are we ready to review the API? @xsoheilalizadeh is eager to implement it ...

xsoheilalizadeh commented 5 years ago

I've figured out that the ToSnakeCaseBySpan can't pass these tests, it's just compatible with pascal case properties, I've tried to make it compatible with other formats but unfortunately, I've failed and it'd be unreadable. I was hurrying to contribute and got a bad result, sorry.

ahsonkhan commented 5 years ago

This looks like a reasonable API addition. I was considering how common such formats are to warrant adding support built-in (rather than folks creating their own naming policy to support it), and looks like quite a few places use snake_case including Facebook, AWS, and Twitter APIs (aside from the fact that Json.NET supports it): https://stackoverflow.com/questions/5543490/json-naming-convention

And of course, what @khellang said:

I think snake case naming is pretty common, especially in the Ruby world, and should be supported out of the box. GitHub's API is probably the most popular example using this naming convention off the top of my head.

cc @steveharter

terrajobst commented 5 years ago

Video

Makes sense. We also considered others (such as kebap casing) but it seems we should defer that until someone actually needs that.

Gnbrkm41 commented 5 years ago

@xsoheilalizadeh, are you still working on it? :D you should be able to open PRs for it.

xsoheilalizadeh commented 5 years ago

Unfortunately no till I pass tests, I tried to pass tests but my implementation only support pascal-case naming for example FirstName will be first_name but in other cases like IsCIA it will be Is_c_i_a that isn't expected.

hez2010 commented 5 years ago

@Gnbrkm41 How about the existing PR dotnet/corefx#40003

Gnbrkm41 commented 5 years ago

I mean... sure, why not, as long as it has all the tests and it passes :D

YohDeadfall commented 4 years ago

@ahsonkhan Did the same thing already, so taking this.

VassilAtanasov commented 4 years ago

Makes sense. We also considered others (such as kebap casing) but it seems we should defer that until someone actually needs that.

The lack of PascalCase option is a show stopper for our migration. I believe there should be a "preserve case" option which will also work for us.

khellang commented 4 years ago

The lack of PascalCase option is a show stopper for our migration. I believe there should be a "preserve case" option which will also work for us.

Isn't that the default? 🤔

https://github.com/dotnet/corefx/blob/296c0e71ddcc4885d30af18dea829c1923cfdc5c/src/System.Text.Json/src/System/Text/Json/Serialization/JsonDefaultNamingPolicy.cs#L9

VassilAtanasov commented 4 years ago

The lack of PascalCase option is a show stopper for our migration. I believe there should be a "preserve case" option which will also work for us.

Isn't that the default? 🤔

https://github.com/dotnet/corefx/blob/296c0e71ddcc4885d30af18dea829c1923cfdc5c/src/System.Text.Json/src/System/Text/Json/Serialization/JsonDefaultNamingPolicy.cs#L9

Yes - that worked ... but not easily! You can not select the Default !!! JsonSerializerOptions.PropertyNamingPolicy = JsonNamingPolicy.Default and also PropertyNamingPolicy gets initialised in constructor to JsonCamelCaseNamingPolicy new JsonOptions().JsonSerializerOptions.PropertyNamingPolicy

However you can trick it with :-) JsonSerializerOptions.PropertyNamingPolicy = null; and PascalCase goes through unobstructed!

Gnbrkm41 commented 4 years ago

the default is to use exact match IIRC. setting it to null also does that as well.

niemyjski commented 4 years ago

I'd prefer if we could just define our own naming strategies. We use our own naming method for snake_case because it occurred before Json.net came out with it and it's slightly different. So perhaps just allow us to set a naming policy or use the default one.

khellang commented 4 years ago

@niemyjski Of course you can. Just implement JsonNamingPolicy like the built-in ones and set that. Almost everything about the new serializer is configurable and/or extensible, these issues are usually just about including more stuff in the box.

sveinfid commented 4 years ago

Please could KebabCase also be added?

It should be a very minor addition - basically the same as SnakeCase but with - in place of _

E.g. requests sent to Apple Push Notification Service require kebab-case JSON fields: https://developer.apple.com/documentation/usernotifications/setting_up_a_remote_notification_server/generating_a_remote_notification#2943360

ahsonkhan commented 4 years ago

@sveinfid - There was precedence for snake casing (e.g. in newtonsoft json) and enough user need to provide built-in support for it. I am not sure that's the case for KebabCase. Please file a separate issue (with the API proposal) and we can consider it, but my hunch is, this is something that the caller would have to implement (like any other custom casing/naming strategy) and doesn't seem common enough to require built-in support.

Like you said, the impl could be relatively straight forward (take existing snake case impl and replace the delimiter), which means there is also little value in providing one built-in.

layomia commented 4 years ago

From @YohDeadfall in https://github.com/dotnet/runtime/issues/673:

Originally proposed by @hez2010 in dotnet/corefx#39564.

@ahsonkhan, could you add the original issue here and important comments from dotnet/corefx#41354?

@karelz We need your opinion here, since @khellang asked a question about altering the existing behavior of the camel case policy too follow common translator rules for the JS world as splitting a string by words and concatenating it back using some separator and applying a policy to change casing for each word. It's a breaking change and @stephentoub likes it. Me too. See dotnet/corefx#41354 (comment).

I still would like to drive this feature.

@steveharter @layomia

J0rgeSerran0 commented 4 years ago

Only FYI

If someone reading this needs to implement this asap, you could find useful the next classes to use a naming policy for SnakeCase and KebabCase.

https://github.com/J0rgeSerran0/JsonNamingPolicy

I hope that helps.

artemsv commented 4 years ago

Thank you, @J0rgeSerran0, for your JsonSnakeCaseNamingPolicy! Going to use it while we are waiting for the built-in support...

thomaslevesque commented 4 years ago

and looks like quite a few places use snake_case including Facebook, AWS, and Twitter APIs

It's also used in all OAuth 2.0 / OpenID Connect responses (token response, discovery document, etc.) I think it's really important to support it out of the box; right now I have to fall back to Newtonsoft.Json to handle it.

khellang commented 4 years ago

right now I have to fall back to Newtonsoft.Json to handle it.

@thomaslevesque You've seen the various workarounds in this thread, right?

thomaslevesque commented 4 years ago

@khellang I haven't seen any real workaround mentioned in this thread... I could implement my own naming policy or use the one proposed by @J0rgeSerran0, but it's not worth the trouble in my scenario. Eventually I didn't use Newtonsoft.Json, I just put [JsonPropertyName] attributes on all my properties.

J0rgeSerran0 commented 4 years ago

Thanks for comment @thomaslevesque Anyway, I am curious to know which are the impediments that you have in your scenario or why System.Text.Json and/or the workarounds are not worth the trouble. Do you have any sample code to check this? It could help us to improve System.Text.Json or the workarounds that you have used. Thanks a lot!

thomaslevesque commented 4 years ago

Anyway, I am curious to know which are the impediments that you have in your scenario or why System.Text.Json and/or the workarounds are not worth the trouble.

In a real application, I would probably use your policy implementation. But in this case it's just a demo app to illustrate a blog article, so I want to keep the code terse and simple. Having a whole snake case naming policy implementation would just add noise.

J0rgeSerran0 commented 4 years ago

Hi Thomas.

Now I better understand the situation. Good to know that you are talking about a special situation in this case, and no about a general situation. This is an important point to avoid possible misunderstandings.

Thanks a lot!

jonathann92 commented 4 years ago

@xsoheilalizadeh I would like to also add that you might want to change this line to remove the && name[i] != name[0] and start the for loop at i = 1. One test case is if the property name is FirstNameFirst would result in First_Name_Firs

https://github.com/xsoheilalizadeh/SnakeCaseConversion/blob/master/SnakeCaseConversionBenchmark/SnakeCaseConventioneerBenchmark.cs#L55

jonathann92 commented 4 years ago

Unfortunately no till I pass tests, I tried to pass tests but my implementation only support pascal-case naming for example FirstName will be first_name but in other cases like IsCIA it will be Is_c_i_a that isn't expected.

@xsoheilalizadeh I've built off your solution to get it working. Let me know if you can think of any other test cases

public class UnitTest1
    {
        [Theory]
        [InlineData("FirstName", "First_Name")]
        [InlineData("FirstNameM", "First_Name_M")]
        [InlineData("StreetAddress1", "Street_Address_1")]
        [InlineData("StreetAddress1Two34", "Street_Address_1_Two_34")]
        [InlineData("IsCIA", "Is_Cia")]
        [InlineData("IsCIAAgent", "Is_Cia_Agent")]
        public void Test1(string input, string expected)
        {
            Assert.Equal(expected, ConvertNameSnakeCase(input));
        }

        public string ConvertNameSnakeCase(string name)
        {
            int numOfUnderscores = 0;

            int namePosition = 1;
            while (namePosition < name.Length - 1)
            {
                if (
                    (char.IsUpper(name[namePosition]) && (char.IsLower(name[namePosition + 1]) || !char.IsUpper(name[namePosition - 1]))) ||
                    (char.IsNumber(name[namePosition]) && !char.IsNumber(name[namePosition - 1]))
                   )
                {
                    numOfUnderscores++;
                }
                ++namePosition;
            }

            if (
                (char.IsUpper(name[namePosition]) && !char.IsUpper(name[namePosition - 1])) ||
                (char.IsNumber(name[namePosition]) && !char.IsNumber(name[namePosition - 1]))
               )
            {
                numOfUnderscores++;
            }

            int bufferSize = name.Length + numOfUnderscores;

            Span<char> buffer = new char[bufferSize];

            int bufferPosition = 0;
            namePosition = 0;

            buffer[bufferPosition++] = name[namePosition++];

            while (namePosition < name.Length - 1)
            {
                if (
                    (char.IsUpper(name[namePosition]) && (char.IsLower(name[namePosition + 1]) || !char.IsUpper(name[namePosition - 1]))) ||
                    (char.IsNumber(name[namePosition]) && !char.IsNumber(name[namePosition - 1]))
                    )
                {
                    buffer[bufferPosition] = '_';
                    buffer[bufferPosition + 1] = name[namePosition];
                    bufferPosition += 2;
                    namePosition++;
                    continue;
                }

                buffer[bufferPosition] = char.ToLower(name[namePosition]);
                bufferPosition++;

                namePosition++;
            }

            if (
                (char.IsUpper(name[namePosition]) && !char.IsUpper(name[namePosition - 1])) ||
                (char.IsNumber(name[namePosition]) && !char.IsNumber(name[namePosition - 1]))
                )
            {
                buffer[bufferPosition] = '_';
                buffer[bufferPosition + 1] = name[namePosition];
                return new string(buffer);
            }

            buffer[bufferPosition] = char.ToLower(name[namePosition]);
            return new string(buffer);
        }
    }

xsoheilalizadeh commented 4 years ago

@jonathann92 There is a PR that already implemented built-in snake-case naming policy, you can us its tests here.

layomia commented 4 years ago

@YohDeadfall, re https://github.com/dotnet/runtime/issues/876#issuecomment-629428887:

Can you remind me what the hold-up/issue was on the implementation for this - https://github.com/dotnet/corefx/pull/41354? I see an approval, and the only unaddressed feedback seemed to be around caching conversion results for dictionary keys (which doesn't need to be addressed as part of this work).

YohDeadfall commented 4 years ago

While Stephen approved the PR, there are some open questions in https://github.com/dotnet/corefx/pull/41354#issuecomment-554958476. No one gave comments on them.

By the way, if my mind serves me Toub said that the camel casing policy should share the same logic with other policies, but right now it just takes the first character and lower cases it while other platforms do it in a different way.

layomia commented 4 years ago

Thanks @YohDeadfall. I'll digest this and follow up.

YohDeadfall commented 4 years ago

@layomia @terrajobst Do I correctly understand that the team isn't willing to introduce the feature in .NET 5? Am really interested in it since it intersects with some stuff in Npgsql, and we should have the standard way to convert strings using the snake case convention.

titanix commented 4 years ago

Why is this isn't in the library already? This is a very basic feature for a JSON library. System.Text.Json is not usable as it is.

layomia commented 4 years ago

Due to other commitments for 5.0, this will have to come in 6.0. Custom implementations can be utilized until then via JsonSerializerOptions.PropertyNamingPolicy.

atrauzzi commented 3 years ago

One year for 6.0 is just way too long for minor stuff like this. I've mentioned this in other .net projects before, but I feel like you guys need to allow yourselves interim feature updates. No breaking changes or anything.

Think about how any other language/framework or ecosystems would do it.

cc @terrajobst -- Does "one year" satisfy any definition of being competitive in our industry? It looks more like a regression to what was comfortable in the Windows-only days.

YohDeadfall commented 3 years ago

@layomia, there's a description on word boundary detection from Unicode (see it Unicode Text Segmentation). Currently I'm implementing it as an external library to test and open a proposal for adding that feature to the runtime. It should allow iteration over a string splitting it by words and providing them as read only spans on which policies can operate on.

YohDeadfall commented 3 years ago

Hello, @layomia and @ericstj! I prototyped a naming convention policies and you can observe them in Yoh.Text.Json.NamingPolicies. There's one think I'm not happy about - the project uses Unicode text segmentation which while allows to handle mostly any case (no space languages like Japanese might be an issue even while it works according to Annex 29) it looks to heavy. My thoughts on it is to drop dependency on Unicode segmentation library and ignore any non-letter or non-digit characters. These characters should be treated as word boundaries.

Test cases currently used (see source for other policies):

        [Theory]
        [InlineData("XMLHttpRequest", "xml_http_request")]
        [InlineData("camelCase", "camel_case")]
        [InlineData("CamelCase", "camel_case")]
        [InlineData("snake_case", "snake_case")]
        [InlineData("SNAKE_CASE", "snake_case")]
        [InlineData("kebab-case", "kebab_case")]
        [InlineData("KEBAB-CASE", "kebab_case")]
        [InlineData("double  space", "double_space")]
        [InlineData("double__underscore", "double_underscore")]
        [InlineData("abc", "abc")]
        [InlineData("abC", "ab_c")]
        [InlineData("aBc", "a_bc")]
        [InlineData("aBC", "a_bc")]
        [InlineData("ABc", "a_bc")]
        [InlineData("ABC", "abc")]
        [InlineData("abc123def456", "abc123def456")]
        [InlineData("abc123Def456", "abc123_def456")]
        [InlineData("abc123DEF456", "abc123_def456")]
        [InlineData("ABC123DEF456", "abc123def456")]
        [InlineData("ABC123def456", "abc123def456")]
        [InlineData("Abc123def456", "abc123def456")]
        public void SnakeLowerCase(string input, string output) =>
            Assert.Equal(output, JsonNamingPolicies.SnakeLowerCase.ConvertName(input));

The next thing I'm going to do with policies is to make them more generic, so they can handle camel and pascal cases as well using the same method, but different WriteWord implementation as @stephentoub suggested before in the original PR.

/cc @khellang

hez2010 commented 3 years ago

Test cases currently used (see source for other policies):

@YohDeadfall What about heading and tailing blank spaces?

[InlineData("  abc", "abc")]
[InlineData("abc  ", "abc")]
[InlineData("  abc  ", "abc")]
[InlineData("  abc def  ", "abc_def")]

YohDeadfall commented 3 years ago

Since it's powered by Unicode segmentation library, it will correctly take words from the input, transform them and join by the specified boundaries (tests in the library used under the hood).

Added your cases too (YohDeadfall/Yoh.Text.Json.NamingPolicies@82e23da5741956d8acc15669c0217502e30f2269), thank you!

YohDeadfall commented 3 years ago

Even without the text segmentation library it's not a problem to extract words, but it might be less accurate in that case since words might contain characters not falling into the letter or digit category.

dotnet / runtime