danipen / TextMateSharp

A port of tm4e to bring TextMate grammars to dotnet ecosystem
MIT License
87 stars 15 forks source link

Switch to System.Text.Json? #34

Closed michael-hawker closed 1 year ago

michael-hawker commented 1 year ago

System.Text.Json is compatible still with .NET Standard via NuGet package. Also, they've added source generator support for even better performance and reducing of reflection.

Was wondering if there's any plans/thoughts on migrating to the new infrastructure from Newtonsoft.Json?

danipen commented 1 year ago

There are no plans now ... but it seems to be easily doable. Newtonsoft library is used only to load theme and grammars data. So the replacement should be straightforward.

I'm quite busy now so feel free to open a new PR if you need those changes soon.

Numpsy commented 1 year ago

fwiw I had a go at converting RegistryOptions in https://github.com/Numpsy/TextMateSharp/commit/ba66148c95fbc090acee93d5bece97bc6ebd5d76.

I tried having a quick go with BenchmarkDotNet and that showed a trivial perf change creating a RegistryOptions (though it barely takes any time as is), but it did reduce the memory allocations as well.

danipen commented 1 year ago

Hi Richard, thanks! Could you share the benchmark results?

Numpsy commented 1 year ago

A benchmark that just creates an instance of RegistryOptions (which calls InitializeAvailableGrammars internally) gave me this:

Current code:

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2130/21H2/November2021Update), VM=Hyper-V
11th Gen Intel Core i7-1185G7 3.00GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]               : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT AVX2
  .NET 6.0             : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8 (4.8.4515.0), X64 RyuJIT VectorSize=256

| Method |                  Job |              Runtime |       Mean |    Error |   StdDev |    Gen0 |    Gen1 | Allocated |
|------- |--------------------- |--------------------- |-----------:|---------:|---------:|--------:|--------:|----------:|
|   Load |             .NET 6.0 |             .NET 6.0 |   996.7 us | 14.89 us | 13.20 us | 95.7031 | 31.2500 | 593.89 KB |
|   Load | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 1,193.8 us | 11.69 us | 10.94 us | 99.6094 | 33.2031 | 618.91 KB |

With System.Text.Json:

BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2130/21H2/November2021Update), VM=Hyper-V
11th Gen Intel Core i7-1185G7 3.00GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]               : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT AVX2
  .NET 6.0             : .NET 6.0.10 (6.0.1022.47605), X64 RyuJIT AVX2
  .NET Framework 4.7.2 : .NET Framework 4.8 (4.8.4515.0), X64 RyuJIT VectorSize=256

| Method |                  Job |              Runtime |       Mean |    Error |   StdDev |    Gen0 |   Gen1 | Allocated |
|------- |--------------------- |--------------------- |-----------:|---------:|---------:|--------:|-------:|----------:|
|   Load |             .NET 6.0 |             .NET 6.0 |   747.8 us |  6.93 us |  6.14 us | 23.4375 | 7.8125 | 143.63 KB |
|   Load | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 1,344.5 us | 11.64 us | 10.32 us | 23.4375 | 7.8125 | 151.58 KB |

But the amount of time it takes is so short that the times vary quite noticably between runs, so all it's really showing is that allocated memory goes down.

This is just from the deserialization of GrammarDefinition inside RegistryOptions though, I haven't tried changing JSONPListParser or seeing if the source generator support does anything.

danipen commented 1 year ago

It would be interesting to compare parsing the themes and the grammars, but it seems that is going to be faster.

Numpsy commented 1 year ago

I'll try to give it a go later

Numpsy commented 1 year ago

I had a quick go at changing JSONPListParser.Parse to use Utf8JsonReader instead of JsonTextReader and there didn't seem to be much of a difference in performance parsing the C# grammar (I suppose that walking forward through the data taking the bits you want has less perf difference between the two libraries than the reflection based approaches).

Saying that, the existing Parse function takes a StreamReader to read from and Utf8JsonReader wants a utf-8 span so I did an extra conversion to get the data, and maybe it would be more efficient if utf-8 data could be read directly from the source stream.

danipen commented 1 year ago

AFAIK utf8 is the default encoding isn't it?

Numpsy commented 1 year ago

Yes, but I don't think you can get utf-8 data out of StreamReader directly, so the string data read from it needs to be converted back into utf-8 (ReadOnlySpan) before you can give it to Utf8JsonReader ?

Numpsy commented 1 year ago

So, I didn't previously have chance to look at this again, but have rebased on top of the latest changes in https://github.com/Numpsy/TextMateSharp/commit/646b6dcaefc3b7a1b7ce16278750ee2cce2585a5

Updated benchmarks. Before

| Method |                  Job |              Runtime |     Mean |     Error |    StdDev |   Median |     Gen0 |    Gen1 | Allocated |
|------- |--------------------- |--------------------- |---------:|----------:|----------:|---------:|---------:|--------:|----------:|
|   Load |             .NET 6.0 |             .NET 6.0 | 1.139 ms | 0.0212 ms | 0.0418 ms | 1.120 ms |  97.6563 | 21.4844 | 604.32 KB |
|   Load |             .NET 7.0 |             .NET 7.0 | 1.063 ms | 0.0210 ms | 0.0486 ms | 1.048 ms |  97.6563 | 21.4844 | 604.32 KB |
|   Load | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 1.250 ms | 0.0228 ms | 0.0375 ms | 1.241 ms | 101.5625 | 25.3906 | 629.85 KB |

After

| Method |                  Job |              Runtime |       Mean |    Error |   StdDev |    Gen0 |   Gen1 | Allocated |
|------- |--------------------- |--------------------- |-----------:|---------:|---------:|--------:|-------:|----------:|
|   Load |             .NET 6.0 |             .NET 6.0 |   767.2 us |  5.67 us |  5.31 us | 23.4375 | 4.8828 | 145.75 KB |
|   Load |             .NET 7.0 |             .NET 7.0 |   754.8 us | 10.89 us |  9.65 us | 23.4375 | 6.8359 | 147.28 KB |
|   Load | .NET Framework 4.7.2 | .NET Framework 4.7.2 | 1,366.3 us | 16.67 us | 15.59 us | 23.4375 | 5.8594 | 153.85 KB |

Turning on the source generator support didn't seem to make that test faster, though I think that's more about startup perf and avoiding reflection than runtime perf during deserlization.

Related: I'm not sure of all the details about how the source generator machinery interacts with the the KeepType stuff that was added for trimming support (i.e would all that still be needed if the source generator has generated hard references to all of the types?)

danipen commented 1 year ago

Related: I'm not sure of all the details about how the source generator machinery interacts with the the KeepType stuff that was added for trimming support (i.e would all that still be needed if the source generator has generated hard references to all of the types?)

Not sure. Maybe @hez2010 can answer that.

hez2010 commented 1 year ago

This shouldn't be related to my changes around trimming. Maybe you can upgrade System.Text.Json to 7.0 and try again. IIRC System.Text.Json only supports source generator since 7.0.

Turning on the source generator support didn't seem to make that test faster.

Isn't is already a lot faster than before? 1.063 ms (1063 us) vs 754.8 us which is a nearly 30% performance improvement.

As for .NET Framework, it uses a slow implementation of Span so it is expected to be slow.

Numpsy commented 1 year ago

Maybe you can upgrade System.Text.Json to 7.0 and try again. IIRC System.Text.Json only supports source generator since 7.0.

v6 according to https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/source-generation-modes?pivots=dotnet-6-0 (you can see if the linked commit that the JsonSerializerContext machniery is there)

Isn't is already a lot faster than before?

That's just from switching from Newtonsoft to STJ, whether or not the source genration is enabled doesn't effect that (but i think it would only improve the startup perf, and I haven't tried benchmarking that, just the loading of the grammars)

This shouldn't be related to my changes around trimming.

What I was wondering about that is if there is a need for KeepType<GrammarDefinition>() when the source generator has generated references to GrammarDefinition itself?

Numpsy commented 1 year ago

Anyway, is there any interest in a PR for the change I linked previously?

I set the System.Text.Json version to 6 as I thought that version might give the required functionality whilst also allowing .NET 6.0 consumers use the in-box version if they wanted, but version 7 should work as well

danipen commented 1 year ago

Yes feel free to submit a PR. I'll try it and if all works as expected I'll merge.

hez2010 commented 1 year ago

What I was wondering about that is if there is a need for KeepType<GrammarDefinition>() when the source generator has generated references to GrammarDefinition itself?

If you are using source generated json, then it is no longer being needed.

Numpsy commented 1 year ago

I think this one is done now?

danipen commented 1 year ago

Yess. Let me update the PR so this issue is automatically closed.