Shopify / packwerk

Good things come in small packages.
MIT License
1.54k stars 111 forks source link

Use prism parser via translator #388

Closed exterm closed 5 months ago

exterm commented 5 months ago

What are you trying to accomplish?

Proof of concept for discussion https://github.com/Shopify/packwerk/discussions/387. Packwerk spends about 2/3 of the time it runs on a large code base in parsing Ruby. Prism is ~5x faster than the current parser (whitequark/parser). So this should be a significant speed boost for packwerk.

It looks like this could be a simple, almost trivial change.

What approach did you choose and why?

We could use Prism directly, but that would require significant changes to Packwerk. Luckily @kddnewton has included a translator with prism that provides compatibility with whitequark/parser, so that it should be a drop-in replacement. We won't get the same performance improvements, but we get some, and almost for free.

What should reviewers focus on?

Type of Change

Additional Release Notes

Include any notes here to include in the release description. For example, if you selected "breaking change" above, leave notes on how users can transition to this version.

If no additional notes are necessary, delete this section or leave it unchanged.

Checklist

exterm commented 5 months ago

The problem seems to be that Parser::Prism raises a StandardError here, whereas Parser::CurrentRuby raises a Parser::SyntaxError in that case.

Parser::Prism raises a more specific error on parser-prism's main branch. We could cath that one explicitly. However, it seems for compatibility parser-prism may want to raise the same exceptions as Parser::CurrentRuby?

kddnewton commented 5 months ago

@exterm Can you use prism v0.20.0 itself? It has Prism::Translation::Parser which is effectively this. I'm going to deprecate parser-prism

exterm commented 5 months ago

It has Prism::Translation::Parser which is effectively this. I'm going to deprecate parser-prism

Oooh!

exterm commented 5 months ago

I'm running into lots of errors that look like I'm not requiring the right thing.

require "prism/translation/parser" =>

NameError: uninitialized constant Prism::Compiler

      class Compiler < ::Prism::Compiler
                              ^^^^^^^^^^
Did you mean?  Complex
    /home/[...]/ruby/3.1.2/lib/ruby/gems/3.1.0/gems/prism-0.20.0/lib/prism/translation/parser/compiler.rb:8:in `<class:Parser>'

When I add a require "prism/compiler" I get

NoMethodError: undefined method `parse' for Prism:Module

        result = unwrap(Prism.parse(source, filepath: source_buffer.name))
                             ^^^^^^
    /home/[...]/ruby/3.1.2/lib/ruby/gems/3.1.0/gems/prism-0.20.0/lib/prism/translation/parser.rb:45:in `parse'
exterm commented 5 months ago

Ah! It needs a require "prism" too. Might want to add that to the instructions in https://github.com/ruby/prism/blob/main/docs/parser_translation.md

kddnewton commented 5 months ago

Ahh sorry it's actually the opposite, you should only require prism and not require the nested one. Everything is autoloaded so it should work just fine with just the top-level require.

exterm commented 5 months ago

Done. I had to change a test because the error message was different. Ideally it would be the same as from Parser::CurrentRuby, no?

kddnewton commented 5 months ago

Error messages are going to be different because they're coming straight from prism. They're meant to be more informative than CRuby, and as such they're likely to be different from the parser gem. The message themselves we're not going to make any guarantees about, as they are likely to change as we continue to improve error recovery.

exterm commented 5 months ago

OK! I can live with that.

exterm commented 5 months ago

Green tests! Thank you a lot for your help (and all the work on Prism of course), Kevin. Will try this out tomorrow on a substantial Rails app.

exterm commented 5 months ago

@gmcgibbon FYI it seems to "just work" according to the test suite. Real world testing before merge seems wise.

exterm commented 5 months ago

Seems to break on numbered parameters in blocks:

undefined method `parameters' for @ NumberedParametersNode (location: (22,27)-(22,65))
└── maximum: 1
:Prism::NumberedParametersNode /home/vscode/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/prism-0.20.0/lib/prism/translation/parser/compiler.rb:1752:in `visit_block'
/home/vscode/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/prism-0.20.0/lib/prism/translation/parser/compiler.rb:244:in `visit_call_node'

The code in question is something like self.something.find { _1.graphql_name == "somethingElse" }

I currently believe that this is a bug in prism, respectively the translator

exterm commented 5 months ago

I'll wait for the next prism release with the bugfix before I make this "ready for review".

I tested with the fix on the largest Rails app I currently have access to (~1m lines of Ruby code) and it works just fine. Speedup seems to be about 30% for a full packwerk check run according to ad-hoc benchmarks in a highly virtualized environment. About 50% on a more or less recent MacBook Pro.

kddnewton commented 5 months ago

@exterm I'll release a patch release on Monday first thing. It'll be v0.20.1.

Thank you very much for your work on this! I'm still very interested in replacing the parser with prism as a whole (that should be much better than using the translation layer) but I recognize that's quite a bit more involved. Is that something you're still interested in working on? I'd be happy to help support in any way I can.

exterm commented 5 months ago

@kddnewton What's the advantage to packwerk from skipping the translation layer?

kddnewton commented 5 months ago

It should be quite a bit faster (i.e., an order of magnitude), because it won't have to go through the whole translation.

The other very tangential, loosely-defined benefit is that it uses the same AST as Ruby, so anyone contributing to packwerk will be gaining knowledge of the AST that all of the Ruby implementations use. This means new contributors won't have to learn a new AST, and in a world where prism is the parser everywhere (the world I'm trying to create) this means a lower barrier to entry for new contributors.

kddnewton commented 5 months ago

The last thing is that we've worked hard to provide a really good Ruby API. Instead of dealing with a single class with a type field, all of our nodes have their own classes. We have utility functions on all of the classes that should make it quite a bit easier to use. Also, if you need anything, you have a line directly to the main contributor and I'll add whatever functions you may need.

exterm commented 5 months ago

I whipped up a quick benchmark and it looks like omitting the translation could indeed provide a huge boost.

kddnewton commented 5 months ago

@exterm v0.21.0 is out now with these bug fixes

exterm commented 5 months ago

@kddnewton thank you, will plug it in when I find some time. Probably later today.

Is that something you're still interested in working on? I'd be happy to help support in any way I can.

I was hoping to make a full parser replacement paid work, but I'm having trouble making a business case for it. Gusto now has a Rust reimplementation of packwerk that is a lot faster, and probably faster than we can make packwerk.

I'm still interested in this for fun, but I'll have to find time for it.

shageman commented 5 months ago

Testing this. I am atm stuck on prism 0.17 due to our version of rbi. Had to read through the discussion to see the version dependency.

exterm commented 5 months ago

mmmh, interesting. That might make a release incorporating this a little harder.

kddnewton commented 5 months ago

I just opened a PR to add support for Ruby 2.7, so this won’t be an issue once I release that.

On Mon, Feb 5, 2024 at 7:20 PM Philip Müller @.***> wrote:

mmmh, interesting. That might make a release incorporating this a little harder.

— Reply to this email directly, view it on GitHub https://github.com/Shopify/packwerk/pull/388#issuecomment-1928551879, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG3P3QLU7F64UQFR2KNRJ3YSFZNZAVCNFSM6AAAAABCVUGFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGU2TCOBXHE . You are receiving this because you were mentioned.Message ID: @.***>

exterm commented 5 months ago

I was going to update prism and mark this PR ready for review, but in that case I'll wait for Ruby 2.7 support in prism.

kddnewton commented 5 months ago

@exterm version 0.22.0 supports Ruby 2.7, so you should be able to use that.

exterm commented 5 months ago

This is ready to go

exterm commented 5 months ago

@gmcgibbon can we get this merged? Shopify will save money in CI 😉

exterm commented 5 months ago

Up to date measurements from a reasonably sized Rails app, using spring. Sadly I can only currently access it through a Github Codespace, so it's a highly virtualized environment. I did not get a lot of variation on repeated runs though.

Packwerk 3.1.0

cached: ~2.6s uncached: ~32s

Packwerk 3.1.0 with Prism::Translator::Parser from Prism 0.24.0

cached: ~2.6s uncached: ~20s

Cached run times are the same, as expected - no parsing happens on repeated cached runs.

exterm commented 5 months ago

For a single 3k line file, the difference is about 0.8s vs 1.1s - it's not dramatic, but it's faster. Moving off the translator layer would be better but is more work. It could be done as a follow-up to this PR.

gmcgibbon commented 5 months ago

Cached run times are the same, as expected - no parsing happens on repeated cached runs.

Right, I think I'm fixating on the wrong type of run. I believe there's an improvement on uncached runs on our app too.

Uncached before:

📦 Finished in 283.79 seconds

Uncached after:

📦 Finished in 187.03 seconds

That's considerably faster, and CI would use uncached, so that's good enough for me.

gmcgibbon commented 5 months ago

Thank you @exterm, I'll run a release for this now.

exterm commented 5 months ago

@gmcgibbon do you want to add a note about the expected speedup (~30% for uncached runs on large apps) to the release notes? Or maybe just say "this makes it faster" or something... Right now it's not really clear from the release notes why the change was made.

I don't have a strong opinion, just making a suggestion.

tylerwillingham commented 5 months ago

25% faster on the main Rails app I work in 👍 Thanks folks