fxn / zeitwerk

Efficient and thread-safe code loader for Ruby
MIT License
1.97k stars 117 forks source link

Uninitialized constant for nested class in sidekiq after code reload with Rails 7 and bootsnap #198

Closed Earlopain closed 2 years ago

Earlopain commented 2 years ago

Hi, I run into issues with sidekiq after making changes with code reload. The project is using Rails 7, Ruby 3.0.3 and Zeitwerk 2.5.1

The relevant directory structure is as follows:

app
├── test
│   ├── a.rb
│   └── b
│       └── c.rb
└── workers
    └── test_worker.rb

a.rb

class A
end

c.rb

module B
  class C
  end
end

Once the application is started everything works fine, both classes are available to rails and sidekiq with no issues. If the code is autoreloaded by making code changes B::C isn't available to sidekiq anymore, while both classes are still accesible by rails. Adding an empty module app/test/b.rb solves the issue, but I shouldn't have to do that.

I'm not sure if this is a problem with zeitwerk, but it looked like a good place to start since it seems to be some kind of loading issue.

Here is a minimal repro, if needed. https://github.com/Earlopain/zeitwerk-repro. It's docker, after starting the application will be available under localhost:9000 There's one page, which prints out A and B::C. Loading also calls a sidekiq worker, which only logs A and B::C. Again, on first load everything works as expected but after code changes sidekiq cannot find B::C anymore.

Here's the log for both rails and sidekiq ``` Initial boot: 22:31:17 puma.1 | Zeitwerk@rails.main: autoload set for ApplicationController, to be loaded from /demo/app/controllers/application_controller.rb 22:31:17 puma.1 | Zeitwerk@rails.main: autoload set for A, to be loaded from /demo/app/test/a.rb 22:31:17 puma.1 | Zeitwerk@rails.main: autoload set for B, to be autovivified from /demo/app/test/b 22:31:17 puma.1 | Zeitwerk@rails.main: autoload set for TestWorker, to be loaded from /demo/app/workers/test_worker.rb 22:31:17 sidekiq.1 | Zeitwerk@rails.main: autoload set for ApplicationController, to be loaded from /demo/app/controllers/application_controller.rb 22:31:17 sidekiq.1 | Zeitwerk@rails.main: autoload set for A, to be loaded from /demo/app/test/a.rb 22:31:17 sidekiq.1 | Zeitwerk@rails.main: autoload set for B, to be autovivified from /demo/app/test/b 22:31:17 sidekiq.1 | Zeitwerk@rails.main: autoload set for TestWorker, to be loaded from /demo/app/workers/test_worker.rb First request: 22:31:19 puma.1 | Zeitwerk@rails.main: constant ApplicationController loaded from file /demo/app/controllers/application_controller.rb 22:31:19 puma.1 | Zeitwerk@rails.main: constant A loaded from file /demo/app/test/a.rb 22:31:19 puma.1 | Zeitwerk@rails.main: module B autovivified from directory /demo/app/test/b 22:31:19 puma.1 | Zeitwerk@rails.main: autoload set for B::C, to be loaded from /demo/app/test/b/c.rb 22:31:19 puma.1 | Zeitwerk@rails.main: constant B::C loaded from file /demo/app/test/b/c.rb 22:31:19 puma.1 | Zeitwerk@rails.main: constant TestWorker loaded from file /demo/app/workers/test_worker.rb 22:31:19 sidekiq.1 | Zeitwerk@rails.main: constant TestWorker loaded from file /demo/app/workers/test_worker.rb 22:31:19 sidekiq.1 | Zeitwerk@rails.main: constant A loaded from file /demo/app/test/a.rb 22:31:19 sidekiq.1 | Zeitwerk@rails.main: module B autovivified from directory /demo/app/test/b 22:31:19 sidekiq.1 | Zeitwerk@rails.main: autoload set for B::C, to be loaded from /demo/app/test/b/c.rb 22:31:19 sidekiq.1 | Zeitwerk@rails.main: constant B::C loaded from file /demo/app/test/b/c.rb Code changed, second request: 22:34:24 puma.1 | Zeitwerk@rails.main: ApplicationController unloaded 22:34:24 puma.1 | Zeitwerk@rails.main: A unloaded 22:34:24 puma.1 | Zeitwerk@rails.main: B unloaded 22:34:24 puma.1 | Zeitwerk@rails.main: B::C unloaded 22:34:24 puma.1 | Zeitwerk@rails.main: TestWorker unloaded 22:34:24 puma.1 | Zeitwerk@rails.main: autoload set for ApplicationController, to be loaded from /demo/app/controllers/application_controller.rb 22:34:24 puma.1 | Zeitwerk@rails.main: autoload set for A, to be loaded from /demo/app/test/a.rb 22:34:24 puma.1 | Zeitwerk@rails.main: autoload set for B, to be autovivified from /demo/app/test/b 22:34:24 puma.1 | Zeitwerk@rails.main: autoload set for TestWorker, to be loaded from /demo/app/workers/test_worker.rb 22:34:24 puma.1 | Zeitwerk@rails.main: constant ApplicationController loaded from file /demo/app/controllers/application_controller.rb 22:34:24 puma.1 | Zeitwerk@rails.main: constant A loaded from file /demo/app/test/a.rb 22:34:24 puma.1 | Zeitwerk@rails.main: module B autovivified from directory /demo/app/test/b 22:34:24 puma.1 | Zeitwerk@rails.main: autoload set for B::C, to be loaded from /demo/app/test/b/c.rb 22:34:24 puma.1 | Zeitwerk@rails.main: constant B::C loaded from file /demo/app/test/b/c.rb 22:34:24 puma.1 | Zeitwerk@rails.main: constant TestWorker loaded from file /demo/app/workers/test_worker.rb 22:34:24 sidekiq.1 | Zeitwerk@rails.main: autoload for ApplicationController removed 22:34:24 sidekiq.1 | Zeitwerk@rails.main: TestWorker unloaded 22:34:24 sidekiq.1 | Zeitwerk@rails.main: A unloaded 22:34:24 sidekiq.1 | Zeitwerk@rails.main: B unloaded 22:34:24 sidekiq.1 | Zeitwerk@rails.main: B::C unloaded 22:34:24 sidekiq.1 | Zeitwerk@rails.main: autoload set for ApplicationController, to be loaded from /demo/app/controllers/application_controller.rb 22:34:24 sidekiq.1 | Zeitwerk@rails.main: autoload set for A, to be loaded from /demo/app/test/a.rb 22:34:24 sidekiq.1 | Zeitwerk@rails.main: autoload set for B, to be autovivified from /demo/app/test/b 22:34:24 sidekiq.1 | Zeitwerk@rails.main: autoload set for TestWorker, to be loaded from /demo/app/workers/test_worker.rb 22:34:24 sidekiq.1 | Zeitwerk@rails.main: constant TestWorker loaded from file /demo/app/workers/test_worker.rb 22:34:24 sidekiq.1 | Zeitwerk@rails.main: constant A loaded from file /demo/app/test/a.rb ```

After making changes sidekiq unloads B::C but does not reload it.

fxn commented 2 years ago

First, let me congratulate you on this extremely good bug report. Terrific.

I have been able to reproduce and the behavior is puzzling, I'll dig into it.

fxn commented 2 years ago

If I disable bootsnap, everything works fine.

@casperisfine do you know what could be happening? As a refresher, implicit namespaces like B are loaded configuring an autoload on the directory, whose require is later intercepted by our thin Kernel wrapper. As you know, this has worked routinely fine since always. This must be something new, or maybe some sort of edge case that didn't show up before.

fxn commented 2 years ago

@casperisfine In case it helps, I saw that when the error happens, Zeitwerk's require is not even called.

fxn commented 2 years ago

I have tried with Rails 6.1 + Zeitwerk 2.4/2.5 + Bootsnap 1.9.3 (the same version used in the original demo), and cannot reproduce. Going to bisect Rails 7.

fxn commented 2 years ago

I am now running the application in the host. Have selectively disabled Bootsnap caches:

DISABLE_BOOTSNAP_LOAD_PATH_CACHE
DISABLE_BOOTSNAP_COMPILE_CACHE

and the only one that matters is DISABLE_BOOTSNAP_LOAD_PATH_CACHE.

fxn commented 2 years ago

And I have verified that, when reproducing the problem,

Bootsnap::LoadPathCache.loaded_features_index.key?(path)

returns true in the require wrapper of Bootsnap (here) for the absolute path to app/test/b. In principle, that does not make sense, because that path is never present in $LOADED_FEATURES.

fxn commented 2 years ago

I have the key difference:

In Rails < 7, Zeitwerk's require is called first. Bootsnap's require does not get invoked because that require for a directory is artificial and Zeitwerk knows the original require does not need to be called (it was not a real require).

However, in the Sidekiq process in the demo, Bootsnap require runs first, and then it calls Zeitwerk's. It's some sort of load order, or order of decoration of require, or something like that.

I'll leave it here today, but we have it.

fxn commented 2 years ago

After sleeping on it, the conclusion is that Zeitwerk and Bootsnap are not fully compatible as they are today, because if you load Zeitwerk wrapper first, and Bootsnap's wrapper later, reloading of implicit namespaces won't work.

Remember, Zeitwerk could be used by any gem dependency of the application, as early as you want, or we could be in a project that is not Rails. I'll see why is the order different when we launch Sidekiq in a Rails 7 application (the server reloads fine), but regardless, this issue has uncovered a conflict I was not aware of.

Not sure what's the right way to approach it in the general case. On one hand, Zeitwerk is intercepting the implicit directories it manages and does not pass them up to the original require. On the other hand, Bootsnap assumes the argument is in $LOADED_FEATURES, but it never was. The cache is not caching $LOADED_FEATURES with fidelity.

I can imagine practical solutions to this, but the question is which is the good one conceptually.

byroot commented 2 years ago

Ack. I'll try to look at this in a couple days.

fxn commented 2 years ago

I have considered different approaches:

  1. Document that you need to ensure a certain loading order if using both gems. I believe this is a bad idea, because gem A could use Zeitwerk, gem B could use Bootsnap, and application X could depend on the two of them. It would be a bad user interface that X should coordinate how to load A and B due to transitive dependencies. You should ideally just depend on them and things should just work.
  2. Bootsnap should not cache something that is not in the cached collection. I am not sure if Bootsnap should do this, or which can be the performance impact of it. I can see pros (cache consistency, be defensive/robust) and cons (the assumption is reasonable, why bother?). Whatever @casperisfine decides in this respect is going to be good. ~My proposed solution below does not depend on this.~
  3. Eager load implicit namespaces. Instead of issuing autoload calls for implicit namespaces with the sole purpose of their interception later, when an implicit namespace is found, you autovivify on the spot and recurse. This is neat, I considered it when developing the library, but it breaks the symmetry of how everything works. It breaks "loaded when referenced", it breaks expectations for eager load exceptions, and callbacks would fire in a different way. Would be also a major refactor, though I'd be willing to do it if that turned to be the best solution. I doubt it is.
  4. Let Zeitwerk push by hand the directory to $LOADED_FEATURES. Would be aligned with what happens with ordinary require calls. However, these are internal. We are tunneling these autoloads. The implementation would leak, and maybe then we find some code that assumes $LOADED_FEATURES stores files. In some sense it sounds reasonable, but there are risks too.
  5. Remove directories from $LOADED_FEATURES. On reload, Zeitwerk removes autoloaded files from $LOADED_FEATURES to let require load them again. It deletes files only, to be precise and have the reject! loop do the exact needed work, no more, no less. But, I believe this issue is telling us that we should just delete it all defensively. EDIT: Problem with this option is that $LOADED_FEATURES does not have the directories, so we'd have to reject something that does not exist to expire Bootsnap cache as a side-effect. Doesn't seem right either.

EDIT: I implemented (4) below. My preferred solution is (5), and I have implemented it in this branch. @Earlopain Would it be possible to test your real application with it?

git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "zeitwerk", github: "fxn/zeitwerk", branch: "loaded-features"
Earlopain commented 2 years ago

First of, thank you for investigating this issue the way you did, I really appreciate it.

I added the snippet you provided to my Gemfile but the problem persists in both the demo and my application. I never used git as a source so I might be doing something wrong here, but I do see Using zeitwerk 2.5.1 from https://github.com/fxn/zeitwerk.git (at loaded-features@74e3c53) while doing bundle install, so I think I did it right. I pushed my attempt to the demo so you can cofirm, just in case I did something silly.

fxn commented 2 years ago

@Earlopain You're right, I did a mistake when trying here and the patch is wrong.

fxn commented 2 years ago

@Earlopain I should have deleted the directories from a $LOADED_FEATURES that do not have them, but it starts to feel weird, instead of defensive.

I have implemented (4) in the same branch, could you please try again? You'll need to bundle update zeitwerk or rebuild the image.

Earlopain commented 2 years ago

I will give the revised fix a try tomorrow.

Earlopain commented 2 years ago

I can confirm that using the branch as it currently is fixes the issue in both the demo and my real application.

fxn commented 2 years ago

@Earlopain I have published 2.5.2 with this change. Thanks again for your awesome bug report.

@casperisfine I believe you can mostly skip the thread.

Summary is: Zeitwerk's require decorator did not push directories for implicit namespaces to $LOADED_FEATURES. This made sense in isolation, but it is more debatable when you consider other wrappers may be in turn decorating you. Bootsnap's assumes a successful require pushes what was required to $LOADED_FEATURES, and I believe that is a reasonable assumption.

So, in 2.5.2 Zeitwerk's require wrapper pushes those directories to $LOADED_FEATURES.

The trade-off is that now we have directories in $LOADED_FEATURES, which may break an expectation of having only files others could rely on. I can't think of a use case for that assumption, but who knows. The only way to find out is by taking the risk.

byroot commented 2 years ago

@fxn thanks for the summary, I must admit I was a bit lazy to dig into this during my vacations :)

Directories in $LOADED_FEATURES don't really scare me.

fxn commented 2 years ago

@byroot enjoy your holidays man :).

fxn commented 2 years ago

I have reverted that patch and published 2.5.3, because the performance impact was too much. In CRuby < 3.1 the impact was noticeable, but In CRuby 3.1 it was dramatic: implicit namespaces were about 180x slower because direct modifications of $LOADED_FEATURES rebuild internal stuff.

This follows the advice of @jeremyevans, who also believes Bootsnap is responsible for having a cache that is in sync with the cached collection, and that Zeitwerk's original behavior seems more reasonable than Bootsnap assumptions.

fxn commented 2 years ago

I have opened https://github.com/Shopify/bootsnap/issues/383.

Meanwhile, anyone facing this can disable the load path cache in config/boot.rb:

# config/boot.rb

ENV["BUNDLE_GEMFILE"] ||= File.expand_path("../Gemfile", __dir__)

require "bundler/setup" # Set up gems listed in the Gemfile.

ENV['DISABLE_BOOTSNAP_LOAD_PATH_CACHE'] = '1' # <---------- ADD THIS LINE
require "bootsnap/setup" # Speed up boot time by caching expensive operations.
fxn commented 2 years ago

While the truly root issue we need to address is that Bootsnap and Zeitwerk should ideally be able to be loaded in any order, I have nonetheless investigated why is the order different in Rails 7.

Rails 6.x had to be ready to boot in two different autoloading modes, and the integration was written to carefully leave everything as it was in Rails 5 if running in classic mode (so that we had a truly "works as always" fallback for upgrades). So, in Rails 6.x Zeitwerk was loaded as lazily as possible, and in particular

require "rails"

by itself did not load Zeitwerk.

In Rails 7, all this has been greatly simplified internally because there is only one autoloading mode, and railties/lib/rails.rb#15 loads Zeitwerk as a side-effect. That require makes sense due to the definition of the method Rails.autoloaders down below.

Now, the bin/rails command evaluates config/boot.rb early and Bootsnap performs its setup there. On the other hand, Sidekiq performs a require "rails" before config/boot.rb is loaded (here), and in this case Zeitwerk is loaded before Bootsnap.

So, that's only for understanding. This load order should not matter if we can make the libraries fully-compatible in this aspect.

fxn commented 2 years ago

While the edge case with Bootsnap is addressed at its root, I have delayed loading Zeitwerk in Rails just enough to make this loading order be a non-issue in Rails apps, it will ship with 7.0.1.

Now I do feel we can close this issue.

Earlopain commented 2 years ago

Thanks again, I appreciate it

casperisfine commented 2 years ago

I opened a PR to fix this, if you feel like trying it: https://github.com/Shopify/bootsnap/pull/385