certbot / certbot

Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.
Other
31.31k stars 3.39k forks source link

Revamp Nginx parsing #5823

Open ohemorange opened 6 years ago

sydneyli commented 6 years ago

Linking relevant issues. General parsing things:

5774

5157

3417

5543 (this one is closed, but we could do it better)

Server block things to keep in mind:

5185

4861

sydneyli commented 6 years ago

Some notes from @ohemorange and I's call earlier today.

So Nginx and Dovecot have kind-of similar configuration languages. Dovecot's is a little bit simpler, but if we revamp the parsing for Nginx, Dovecot's parsing should be able to take advantage of all the new nice tooling.

Let's parse all the configuration files into immutable tree-like objects, like an AST but not really. There will be a general "Statement" interface that wraps either the set of tokens it contains, or more Statements. This object can contain metadata (like tabbing/space info, the origin file).

We'll also have a "Delta" interface which allows you to batch/queue sequences of changes to a Statement. A read/write/read sequence would require you to actually apply these Deltas before the second read, but we can do this lazily.

sydneyli commented 5 years ago

Restarting work on this, after a long hiatus.

To restate, overall goal of this project is to completely overhaul certbot-nginx/certbot_nginx/parser.py with minimal changes needed to the other files.

General plan / model for slowly migrating this in smaller PR chunks:

So in total I'm guessing this will be 6-ish PRs of around size ~200. I'm currently experimenting with breaking up the monstrosity that is part 3, to see if there are any other compatibility concerns other than everything in the Nginx plugin assuming that we're passing around UnspacedLists. Updates to part (3) might affect the upstream object design in (1) and (2), so there might not be any PR activity until I get that sorted.

I'll also try to keep everything as clean and documented as possible, since refactors are scary. The light at the end of the tunnel is that in the final PR parser.py is much lighter and reads much cleaner. We also no longer need to manually feed in whitespace data to the parser when adding directives etc. If the upstream parsing objects are of use to other plugins, we can also pull them into plugins/common, but for now they'll live in the nginx plugin.

ohemorange commented 5 years ago

@sydneyli I'm excited to see this happening!

Some thoughts:

Could you say more about "Add more functionality for finding and manipulating "statements" in large blocks of newline-separated statements"? I'm not quite sure what that means.

Also, it sounds like you're taking on a lot of the load here; I bet we could find a way to restructure things so that through reviews we can share the load. For instance, if part 3 affects 1 and 2, maybe somehow starting with part 3 could make this easier to get a chunk of it out? Like, maybe picking one function in NginxParser to rewrite, and bundling that with only the objects it would need under the hood to keep its current functionality. I'd rather see a PR get in that we have to change later than have it all have to sit on your machine until it's perfect.

sydneyli commented 5 years ago

Could you say more about "Add more functionality for finding and manipulating "statements" in large blocks of newline-separated statements"? I'm not quite sure what that means.

Sure! To translate this into nginx-speak, generally there's a lot of logic in parser.py that deals with finding, adding, removing, and replacing directives. I'm moving that logic into the Statements object-- which generally represents a list of list of tokens. A directive would be represented by a list of tokens.

For instance, if part 3 affects 1 and 2, maybe somehow starting with part 3 could make this easier to get a chunk of it out?

To prevent stacking up a monolith of \~perfected\~ work on my own machine, I'm hoping to get parts 1&2 up into PRs today for review, and then if we need any upstream changes for part 3 I'll probably just include those changes in future PRs. Most of the work thus far has been wrangling the giant PR into smaller dependent PRs, which wouldn't change if I decided to do part (3) first.

ohemorange commented 5 years ago

Sounds great, thanks!