avitex / rust-dangerous

Rust library for safely and explicitly parsing untrusted data
MIT License
51 stars 4 forks source link

My notes on rust-dangerous #3

Closed Byron closed 4 years ago

Byron commented 4 years ago

To me, dangerous comes at just the right time as, coming from nom, I was looking for a crate to parse git-config files with zero-copy and nice error messages. nom lacks in that department unless considerable time is invested, especially for me who never found noms error type too intuitive.

Especially after having read how the parser was implemented in rslint and loving it, the similarities to this kind of parser became apparent immediately.

While reading the docs and studying the API I came to really, really like dangerous despite not even having used it. To me it seems I could implement a parser much more intuitively compared to a parser combinator like nom, while getting better errors out of the box.

Something I also love about this crate is it's leanness - ~4k lines of code and nearly no dependencies are great for small binaries and fast compiles times, so overall this is a great proposition.

The only nitpick I could offer is to see more benchmarks (or I couldn't make sense of what's there), even though to me performance is secondary here while great error messages are paramount.

Something I would wonder about is if there is an easy way to also provide line numbers in the error case. My research indicates that it should be easy to implement since the Error-Context does have access to all input, so I think one should be able to count 'lines till span with error' once there is an error.

Now I am somewhat at a fork in the road as I noticed that combine, another parser combinator, comes with pretty decent error messages by default, and comes with examples on how to customize error reporting further. Given my experience with nom it appears that combine could fit the bill better given my prior experience. However, what really speaks for trying dangerous anyway is its fast build times and small footprint overall.

Maybe it's best to spike both dangerous and combine and try to push error reporting to the desired level in both, and see what feels more straightforward. If you are interested, I am happy to post my experience here, and of course am happy to hear your thoughts particularly on adding line numbers and getting nice super nice user facing errors.

Thanks for your sharing this awesome crate!

avitex commented 4 years ago

Hello Byron, sorry for the late reply, was having a very lazy weekend!

Thank you for your kind words, this was one of the crates at the end of my yak-shave so I want to make sure I really really get this one right. I'm definitely interested in working on improvements and maintaining this crate now and into the future and I would love to see this crate fit a number of different use cases.

Benchmarks and more importantly better testing and test coverage is a top priority currently. It's also been made clear to me I need to add more documentation both in code and outside of.

The line numbers is a good suggestion! All of the context required to increase error verbosity for text formats is available and I don't see any reason why this shouldn't be a feature. Most of the work around this crate was the error system so I'd like to see it polished. Truth be told, I almost ripped out a lot the error reporting and input formatting into a separate crate but decided not to, just to keep it all in one place.

I would absolutely love to hear about your experience with it and I'm very much on board for tailoring the library further. I'm currently working on a couple of crates that consume this library so I'm sure I'll be finding some corner cases that can be improved. I also did start tinking around with the idea of a crate that consumes this one that has nom inspired functional interface but I haven't done much work on it.

Thanks again Bryon, I look forward to hearing what you think! :)

Byron commented 4 years ago

Hello Byron, sorry for the late reply, was having a very lazy weekend!

These can be the best :)!

I'm definitely interested in working on improvements and maintaining this crate now and into the future and I would love to see this crate fit a number of different use cases.

I am very glad to hear that! Having had a quick look at untrusted, I believe dangerous is very unique in the awesome first impression that it makes. I guess what I am trying to say is that you are onto something here.

Benchmarks and more importantly better testing and test coverage is a top priority currently.

As for the benchmarks, to me these would be interesting out of curiosity mainly, as I wonder how far from the optimum one can possibly be with todays optimizers. Thus I would expect dangerous to perform well already, even though a benchmark or too might reveal possible areas of improvements for when there is time to spare.

It's also been made clear to me I need to add more documentation both in code and outside of.

Interesting, I didn't find the docs lacking at all, quite the opposite and fun to read :D. An example exploring custom errors further maybe even with line numbers ;) would certainly be most interesting to me.

Most of the work around this crate was the error system so I'd like to see it polished.

And I am looking forward to that as well!

Truth be told, I almost ripped out a lot the error reporting and input formatting into a separate crate but decided not to, just to keep it all in one place.

That sounds like the right thing to do. The only reason for me to split things out into their own crate is dependencies, but this doesn't seem to be the case here. Unless there is a new take on error handling, of course, which would definitely be interesting in its own right.

I would absolutely love to hear about your experience with it and I'm very much on board for tailoring the library further.

I do plan to contribute an 'ini' parser example, and one that does the same with zero-copy. It will take me a moment to get there as I want to try combine first with a zero-copy version of their existing 'ini' parser example. That should give me the experience I need to pick a crate.

I also did start tinking around with the idea of a crate that consumes this one that has nom inspired functional interface but I haven't done much work on it.

Woah, that sounds like a time-sink! From a user's perspective, I would be very interested to read the 'unique selling points' first to learn which trade-offs it choses over nom and combine. Thus I hope you find the time to learn about these as well to figure out what that could be.

Thanks again Bryon, I look forward to hearing what you think! :)

I will! Thanks for your answer, a pleasure exchanging thoughts with you!

avitex commented 4 years ago

@Byron the error line is a part of master now :)

Example:

error attempting to peek u8: found no bytes when at least 1 byte was expected
> "[\n"
      ^
additional:
  error line: 2, error offset: 2, input length: 2
backtrace:
  1. `peek u8` (expected non-empty input)
Byron commented 4 years ago

Whoop whoop! Thanks so much! I had a look at the implementation, and it does seem quite straightforward indeed, and is pretty much what I thought would be done. This certainly is a testament to the documentation from which all my knowledge about dangerous stems after all.

So I just changed my mind and will rather create a simple 'ini' parser using dangerous first! My intuition is that it will be a very pleasant experience to the point where I won't bother trying the same in combine and use dangerous for git-config right away. combine probably has nice error messages out of the box, but nothing that can't be approximated with Reader::context(…). Something that I always consider valuable is the fast compile times dangerous provides, and I reckon it's going to be zero-dependencies for my use-case then.

avitex commented 4 years ago

@Byron Can't thank you enough for your kind words :)

I plan to keep this library relatively simple without a large surface area, while exposing enough to customize and interface. I see this crate more as providing a foundation for building parsers and parser libraries from, providing a error system and a core reader interface for building upon.

Fantastic to hear you'll be using dangerous :') I'm very keen to hear about any gripes at all