Argument-Clinic / cpython

The Python programming language
https://www.python.org/
Other
1 stars 0 forks source link

Refactor the `DSLParser` state machine, removing `self.next()` #14

Open AlexWaygood opened 1 year ago

AlexWaygood commented 1 year ago

I don't like the current way the state machine works, at all. The indirection via self.next() is a leaky abstraction, as the state_foo methods don't really have the same signature at all (or shouldn't, at least). It's also really hard to get your head around and understand if you're not familiar with how clinic works (as I wasn't, until a month or two ago!).

With this refactor, the state_foo methods (which are renamed to be handle_foo methods) are now able to have different type signatures, removing the need for constant assert function is not None assertions everywhere.

erlend-aasland commented 1 year ago

I didn't take a close look yet (on mobil), but this looks promising. I find the naming slightly off, though. Wouldn't a parse_ prefix (instead of handle_) make it more obvious what's happening?

AlexWaygood commented 1 year ago

Wouldn't a parse_ prefix (instead of handle_) make it more obvious what's happening?

parse_ is the obvious verb here, yeah, but we've already got a family of methods on the DSLParser class with that naming convention (parse_parameter, parse_star, etc.) which work slightly differently. (They're never called by self.next() in the old state machine, and they're never called by self.handle_line() in the new one). I wanted to use a different verb for this family of methods to clearly differentiate the two groups of methods.

AlexWaygood commented 1 year ago

Probably a more fundamental issue is that the DSLParser class is too darn big, and needs to be broken up into several classes (one with the parse_* methods, one with the handle_* methods, one with the directive_* methods). Not wild about tackling that now, though; this PR already feels big enough!

AlexWaygood commented 1 year ago

And this of course brings us back to #4: the DSLParser class should probably also be in a separate module...

erlend-aasland commented 1 year ago

Probably a more fundamental issue is that the DSLParser class is too darn big, and needs to be broken up into several classes (one with the parse_* methods, one with the handle_* methods, one with the directive_* methods). Not wild about tackling that now, though; this PR already feels big enough!

Yes, we need to break the state machine out of DSLParser (or some of the other stuff). For myself, I need to study this for many more weeks until I can even start pointing to a useful refactor :)

AlexWaygood commented 1 year ago

Coverage of the DSLParser class overall is pretty good both before and after this PR, but some of the new branches I'm adding in handle_line() aren't yet covered; I'll need to add coverage for them. Some of them may in fact be unreachable...

(This reveals another benefit of this PR: it exposes which situations aren't yet being considered by the clinic tests, by unraveling each possibility into a separate branch. Currently all the state_foo methods are only ever called via self.next(), which makes our coverage stats look better than they really are!)

image

erlend-aasland commented 1 year ago

Yes, we need to break the state machine out of DSLParser (or some of the other stuff). For myself, I need to study this for many more weeks until I can even start pointing to a useful refactor :)

Actually, my suggestion is not a very good one 😄 The parser state machine of course needs to stay in the DSLParser. There is other stuff we can separate out, though.

AlexWaygood commented 1 year ago

Coverage of the DSLParser class overall is pretty good both before and after this PR, but some of the new branches I'm adding in handle_line() aren't yet covered; I'll need to add coverage for them. Some of them may in fact be unreachable...

After studying the code for a while, I decided that both of the uncovered branches were in fact unreachable -- which indicates that the way the current code in clinic.py has some incorrect uses of self.next(). I filed https://github.com/python/cpython/pull/107635 to fix that.

erlend-aasland commented 1 year ago

[...] which indicates that the way the current code in clinic.py has some incorrect uses of self.next() [...]

Well, in a classic state machine, having "setup" states (like param-docstring-start) is IMO correct use of the state machine :)

AlexWaygood commented 1 year ago

Okay, following https://github.com/Argument-Clinic/cpython/pull/14/commits/9dc86c8c2ebf81e3ffaab7fd3b8f2d450424d3ae, there are now no significant new uncovered lines in this PR! (To get there, I had to reintroduce some indirection, which I don't love... but you can't always get what you want :)

There are still one or two small changes that can be broken out separately, though; I'll see about doing that.

erlend-aasland commented 1 year ago

Okay, following 9dc86c8, there are now no significant new uncovered lines in this PR!

I'll have a look later (possibly tonight)!