datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.07k stars 99 forks source link

Binary Model? #68

Closed BenjaminHolland closed 5 years ago

BenjaminHolland commented 5 years ago

It seems like constraining parsing to only strings and character arrays seems unnecessarily restrictive. Creating parsers that operate on bytes and byte arrays could be super useful for parsing binary data into arbitrary objects. Creating a byte-array based version of the classes in Model would be the first step to doing this.

I'm wondering if this is a feature that would be appropriate to include in this project, or possibly fork into a different one. I haven't seen a lot of discussion about doing this in ANY nomadic parser library, and so I'm also wondering if I'm just way out to sea on this one.

nblumhardt commented 5 years ago

Hi! It's not at all an unreasonable idea; I think Pidgin (C#) and nom (Rust) are two examples I can think of that allow this.

The main considerations against adding it in Superpower are:

  1. Binary parsers generally need to operate on streams, since files/network formats often need to deal with gigabytes of data at a time; Superpower is designed for small language sources where loading a string up-front is an acceptable trade-off for the simplicity it brings

  2. The main benefit Superpower brings over other options is that for end-user-facing parsers it provides good error reporting; binary format parsers don't generally need to report errors back to users (I don't want to know what bytes were found/expected in a broken JPEG file ;-)) so its benefits are reduced, and there are probably faster options due to the reduction in error reporting machinery

Neither's an insurmountable barrier to considering this, but I think we'd need a really compelling use case to justify the addition.

Forking and stripping the library back to better suit these goals does sound like a fun project, though :-)

BenjaminHolland commented 5 years ago

https://github.com/BenjaminHolland/superpower/tree/feature/binary

Initial, minimal implementation. I built it with Memory, though I'm not sure that was the right idea. It doesn't build yet due to the XML comment errors and the non-existant Core 2.1 library.

One idea here is actually to interoperate with the new pipelines library. That library handles the streaming and the asynchronous stuff, and exposes the data stream in smallish chunks that need to be polled, parsed, and dispatched separately, so the first point is really an issue.

It could also be used for image processing or other smallish data formats where you need to parse header, content, etc. Building a parsing language for an image, and having that language be able to say "Hey, this header is whack, yo". would, to me, seem pretty useful.

Speed is the one thing that I think is an issue. You're already trading off speed for expressiveness by using a combinator library in the first place.

nblumhardt commented 5 years ago

That's awesome :-) !!!

nblumhardt commented 5 years ago

Closing this as I don't think it's a feature that will make it into Superpower, but I hope it's going well, @BenjaminHolland - keep us posted!