danipen / TextMateSharp

A port of tm4e to bring TextMate grammars to dotnet ecosystem
MIT License
87 stars 15 forks source link

Parse grammar definitions with System.Text.Json #43

Closed Numpsy closed 1 year ago

Numpsy commented 1 year ago

Refs https://github.com/danipen/TextMateSharp/issues/34

This is just replacing Newtonsoft.Json with System.Text.Json - it doesn't contain any of changes to the trimming related functionality mentioned in #34

Fixes #34

danipen commented 1 year ago

Thanks for the contribution. Question: Should we use the generators to be compatible with AOT and remove the existing code?

Numpsy commented 1 year ago

I thought that being better for AOT was supposed to be one of the benefits of the source generator, but I haven't tested that myself

danipen commented 1 year ago

Sorry, maybe I didn't explain myself very well.

So my two questions are

  1. As you already added the generator [JsonSerializable(typeof(GrammarDefinition))] Should we remove the KeepType stuff added by @hez2010?

  2. Should we remove the Newtonsoft dep in the props file?

hez2010 commented 1 year ago

I think the answer to both is YES.

Numpsy commented 1 year ago

For (2), this change only changes the part that parsers grammers - there is another use of newtonsoft that does plists at

https://github.com/danipen/TextMateSharp/blob/c1d4ad804efeede2475412f788731277ed170640/src/TextMateSharp/Internal/Parser/Json/JSONPListParser.cs#L21

danipen commented 1 year ago

That's used to read the grammar file. We should migrate that too.

Numpsy commented 1 year ago

I had a small go a while back at replacing JsonTextReader with Utf8JsonReader and I think the small issue I had there was that the current APIs in the area all use Stream or StreamReader, whilst Utf8JsonReader operates on spans and sequences.

From what I recall, I think the situation was that as long as JsonPlistParser is given a StreamReader then it either needs to read the whole thing into an array to use as a span, or do something like https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/use-dom-utf8jsonreader-utf8jsonwriter?pivots=dotnet-7-0#read-from-a-stream-using-utf8jsonreader with chunks of data. (Maybe none of the files it's reading are big enough for it to be an issue though, just thought I'd query first)

danipen commented 1 year ago

Thanks for taking a look.

Yes, it's odd that the Utf8JsonReader doesn't allow to read from a Stream.

It seems that we need to do it manually ... Have a buffer and read chunks of bytes from the stream, let's say 2048 each time, and pass them to the Utf8JsonReader as a span.

danipen commented 1 year ago

@Numpsy thanks for the effort in bringing System.Text.Json to this project. Unfortunately, using the UtfJsonReader to read from a stream is not trivial.

Probably we could go with the current changes, and let the JSONPListParser still use Newtonsoft, but I prefer not to rely on two different mechanisms (which involves downloading both NewtonSoft + System.Text.Json from NuGet).

Probably the dotnet guys add a UtfJsonStreamReader soon and we can revisit this stuff.

Numpsy commented 1 year ago

Hmm, that looks more complicated that the MS example :-(

Another possibility might be to see how it'd look using JsonDocument instead of using the reader directly?

danipen commented 1 year ago

I gave a try to the MS example but I was unable to get it working. image

I also gave a try to the StackOverflow solution which is not trivial. Also getting exceptions there.

So I'm not confident this is going to work fine unless the dotnet guys release a official UtfJsonStreamReader.

In the other hand, I'm not sure if we can get the tokens from the JsonDocument, but unfortunately, it requires loading the whole JSON grammar in memory to later build the IRawGrammar which doesn't sound like a good idea.

danipen commented 1 year ago

That being said, if you give it a try and you find a solution for it, please let me know :-)

danipen commented 1 year ago

I made a draft that uses System.Text.Json to read the JSONPListParser. Now is loading the whole bytes in memory (but it's working without exceptions. Not too bad since the biggest grammar is ~655KB.

But this is not an acceptable solution. A requirement for this PR to be merged would be read in chunks. If someone wants to take a look into it, it would be appreciated!

Numpsy commented 1 year ago

I'll try to give it another look next week

Numpsy commented 1 year ago

I found this old commit from when I was last playing with things, just in case it's of interest: https://github.com/Numpsy/TextMateSharp/commit/15b2cef0a492ce002acd40c00f8c3979fd65b6b4

Numpsy commented 1 year ago

I had a first go at making it read the data in chunks... I don't know if it's the best approach, but the unit tests are passing for me at least.