datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.05k stars 98 forks source link

Asynchronous tokenizing and parsing #111

Closed NickRedwood closed 3 years ago

NickRedwood commented 4 years ago

Hi, firstly thanks for writing this library, I've found it very useful.

Are there any plans to support tokenizing (firstly) of data sources that may be accessed asynchronously i.e. from a Stream of some type? i.e. the code reads from a stream, probably some buffer-size at a time, but every so often requires an async call to get more data.

As I understand it, ValueTask would be a zero-overhead way of achieving this and so essentially we want the incoming data to be an IAsyncEnumerable, and it returns another IAsyncEnumerable.

I imagine the async counterpart to a method like: public TokenList<TKind> Tokenize(string source)

would be: public IAsyncEnumerable<TKind> Tokenize(IAsyncEnumerable<char> source)

Obviously ValueTask would then permeate much of the rest of the codebase too, but for now I'm just looking at the tokenizer part.

As the library doesn't support this currently, are you able to provide any pointers on tokenizing in chunks, but capturing any un-tokenized remaining string at the end of each chunk? public Result<TokenList<TKind>> TryTokenize(string source) doesn't seem to be the signature that will work - I would need the Result (or other success/failure type) to be inside the TokenList rather than the other way around. Overall I'm unsure if I can make this work within the existing framework, or if it'd be less work to write my own AsyncTokenizer.

Thanks.

nblumhardt commented 4 years ago

Hi! Thanks for dropping by.

Porting this all to async for this purpose would be really interesting; I don't think we'd do that in this project directly as I think some substantial changes would be required before it would be useful (e.g. incremental tokenization and parsing, since anything being received in chunks via async I/O could be quite large).

If you decide to take a shot at an async version I'd love to check it out - keep us posted! :-)

NickRedwood commented 4 years ago

Ok, I'll put it on my list of interesting projects to try some time! I expect it wouldn't be too difficult to make some progress on tokenizing, however it would be a lot more challenging and time consuming to work through async-ifying the rest of the library.