Open thegedge opened 4 years ago
I'm going to explore emulating autoload behaviour. It will add more complexity to constant_resolver
, but it will better ensure correctness.
The gist of the idea is that when we see ::Foo::Bar
, we first try to autoload ::Foo
. If we find a foo.rb
file, parse it. If it has a local definition of Bar
, we're done. Otherwise, continue looking for foo/bar.rb
. If we find it, parse it for local definitions and finish. Otherwise, we assume we can't find ::Foo::Bar
.
There's a performance impact to be measured here.
I could actually see how for a full scan this could be slower than https://github.com/Shopify/packwerk/issues/100, because we may re-parse commonly referenced files a few times. Now, of course, we could memoize.
But in the end I assume for a full scan we would still parse all files twice. Which means that the "load a constant to resolve it" approach could be similarly fast, but more general (also works on constants defined in non-autoloaded code) and simpler.
I guess what I'm saying is that we need a performance comparison between the two approaches.
I guess what I'm saying is that we need a performance comparison between the two approaches.
Yep, I'm definitely interested in seeing a performance comparison here. If https://github.com/Shopify/packwerk/issues/100 is only slightly slower, we should go with it. We should perhaps establish how much slower is acceptable (on the level of single file, package, and full codebase).
Some interesting observations from before and after:
require
-ing). For example, if I have Foo::Bar
in the Spam
namespace, Spam::Foo
may have been defined in a non-autoloaded location, but we still find a foo.rb
and assume that that's what's being referenced.For the last point above, we've technically replaced one assumption with another: if we're trying to resolve a constant <Namespace>::MyThing
, and can't find MyThing
anywhere in <Namespace>
, we assume it's defined elsewhere.
Personally, I think this is a better assumption than assuming it's defined in <Namespace>
.
We also want to explore Module#const_source_location
. If its performance is good (since we'd have to actually load and execute code), it would be far superior than what we try to emulate here in this codebase.
I explored (roughly) some performance characteristics of a move to const_source_location
here:
Warming up --------------------------------------
const_source_location
24.000 i/100ms
parser_gem 32.000 i/100ms
rubyvm_ast 39.000 i/100ms
Calculating -------------------------------------
const_source_location
255.728 (± 4.7%) i/s - 1.296k in 5.079376s
parser_gem 330.604 (± 1.5%) i/s - 1.664k in 5.034414s
rubyvm_ast 400.538 (± 2.0%) i/s - 2.028k in 5.065007s
Comparison:
rubyvm_ast: 400.5 i/s
parser_gem: 330.6 i/s - 1.21x slower
const_source_location: 255.7 i/s - 1.57x slower
So using const_source_location
isn't as slow as I'd expect, but this was also looking at a very uninteresting test file. It's also hard to benchmark this on a single file, since const_source_location
would generally need to autoload many other files.
I think it will be better to explore this across larger packages of code, after https://bugs.ruby-lang.org/issues/16764 lands in Ruby 2.7.2.
```ruby require 'benchmark/ips' require 'parser/current' FILE = ARGV[0] def const_source_location load(FILE, true) end def walk(node) return methods unless node return methods unless node.is_a?(::RubyVM::AbstractSyntaxTree::Node) yield node node.children.each do |child| walk(child) { yield node } end end def parser_gem ast = Parser::CurrentRuby.parse_file(FILE) walk(ast) { } end def rubyvm_ast ast = RubyVM::AbstractSyntaxTree.parse_file(FILE) walk(ast) { } end Benchmark.ips do |x| x.report("const_source_location") { const_source_location } x.report("parser_gem") { parser_gem } x.report("rubyvm_ast") { rubyvm_ast } x.compare! end ```
With the constant
::Foo::Bar::Spam
we look for these files on all load paths:foo/bar/spam.rb
foo/bar.rb
foo.rb
The first of those to exist is the file that defines the constant. If it's an exact match, we can guarantee a match (in any sane, Ruby app). This may not be correct for (2) or (3) though, if both exist.
This potentially gets even more confusing when we consider things that are loaded via
require
in paths that reflect a similar structure as autoload paths. Unit tests, for example, can often fit the same hierarchical structure as their counterparts.Let's look into how we can either a) not make any assumptions and avoid false positives, or b) another solution that completely avoids false positives.