VorpalBlade / chezmoi_modify_manager

Tools for chezmoi to handle mixed settings and state
GNU General Public License v3.0
40 stars 5 forks source link

Looking for feedback on an upcoming `winnow` API change #20

Closed epage closed 1 year ago

epage commented 1 year ago

When developing winnow, I found that certain behavior had a dramatic performance impact (e.g. using Located). My theory on the root cause was that the slowdown is because of how much bigger the return type of parsers gets. winnow-rs/winnow#268 tries to change the API to allow people to reduce the size of their return types. In that PR, I've made comments on the code that highlight what the impact of this change is.

This is a major change going from a pure-functional API to more imperative (while still allowing a mostly declarative use). It allows for a new class of bugs but prevents a different class.

What I'm wanting to understand is what users of winnow think of this API, ranging from "its good" to "if the gains are there" to "I'd drop winnow.

VorpalBlade commented 1 year ago

Reading the docs (specifically the diff of the tutorial) in that PR you mentioned, it is a major change.

What I use winnow for and how I use it

For my purposes the performance has been great. It was still 50x faster than the previous python iteration. I simply use winnow for a small custom config DSL specifying rules for how to do the main job. The "heavy" lifting that I do (merging ini-files subject to those rules) is done with a completely custom and hand written parser).

The config file format is a simple newline separated format, with one directive per line (or a comment, starting with #).

Mostly I use combinators from what I can tell looking at the code (it has been a few months).

I don't use Located as far as I can see (in fact, I probably should to get better error messages, the current parse errors are cryptic, and I haven't spent any time on trying to improve that).

My understanding of the change and some questions

As I understand the change is that the type of parsers changes from (ignoring error handling): &str -> (&str, Parsed) to &mut str -> Parsed. It wouldn't be a problem for me to use a mutable buffer. Though, maybe I don't understand the change since it leaves me with two questions:

My take on the change

I wouldn't mind a change (in a new semver version of course) as long as there is a clear tutorial of how to migrate (preferably as mechanical steps to apply, not rewrite the whole thing). And it has some tangible benefits.

For inspiration of what sort of things users have in their code that need migrating, consider looking at https://github.com/VorpalBlade/chezmoi_modify_manager/blob/main/src/config/parser.rs (would save me a bunch of time ;-)). And of course look at what the parsers of other projects use.

If you want to see my user facing documentation of the syntax I'm parsing, look at the output of --help-syntax which can be found at https://github.com/VorpalBlade/chezmoi_modify_manager/blob/main/src/lib.rs#L85 or some of the example config files

epage commented 1 year ago

Thank for your taking the time to give such a thorough response!

I don't use Located as far as I can see (in fact, I probably should to get better error messages, the current parse errors are cryptic, and I haven't spent any time on trying to improve that).

Most error reporting scenarios don't actually need Located. My hope is to provide a better "Getting Started" error with this upcoming release (winnow-rs/winnow#103)

How does that work, I thought &str consisted of (start ptr, length))? Can you modify the starting location of a &mut str in the caller?

Yes, the (start ptr, length) is what is mutable rather than what the ptr points to. You can see this in action at https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cb768eb21bd28e2fefcd2cc0965b1444

Also, how does backtracking as done by alt work in this case? I'm fairly sure my parser relies on backtracking quite heavily, in a nested way even (e.g. alt at the top level for selecting between different directives, as well as alt inside the directives for selecting between different forms of the directive with optional parameters etc).

alt saves a checkpoint() and then reset()s the stream to the checkpoint before every parser attempt (see https://github.com/winnow-rs/winnow/pull/268#discussion_r1248111721)

Any manual backtracking would have to do the same.

Alternatively, you can use peek_next instead of parse_next to get the old API. If you find one specific parser is better written the old way, you can wrap it in unpeek (naming is hard).

I wouldn't mind a change (in a new semver version of course) as long as there is a clear tutorial of how to migrate (preferably as mechanical steps to apply, not rewrite the whole thing). And it has some tangible benefits.

Yes, it will be there.

My general philosophy is

  1. Have the compiler guide people through breaking changes, where possible, by supporting the old and new way in the existing version and deprecating the old way
  2. Always provide a reference-style list of breaking changes
  3. When breaking changes are "big" enough, provide a more tutorial or cookbook approach to migrating

I am also familiar with the pain of these migrations and see first hand a lot of weird scenarios, not just because of the extensive set of tests but also because I have my own projects that use winnow, like toml_edit.

VorpalBlade commented 1 year ago

Looking at it I think this definitely needs 3 (cookbook/tutorial).

VorpalBlade commented 1 year ago

Also I can't find the benchmarks where you show the performance benefit of this? Maybe I missed something (the PR was long, I admit I skimmed most of it).

epage commented 1 year ago

I am still working on verifying the performance. I figured I'd start soliciting feedback in parallel to those efforts (the PR still has a lot of work left to do on it).

In the comments, a user has already run benchmarks and seen benefits.

VorpalBlade commented 1 year ago

Closing this, as I don't think there is anything else to add currently.