Shopify / bootsnap

Boot large Ruby/Rails apps faster
MIT License
2.67k stars 180 forks source link

Support changes in $LOADED_FEATURES #224

Closed fxn closed 5 years ago

fxn commented 5 years ago

Right now, the decorated require trusts its cache:

return false if Bootsnap::LoadPathCache.loaded_features_index.key?(path)

So it does not preserve require semantics:

require "foo" # => true
$LOADED_FEATURES.pop
require "foo" # => false, should be true

I've seen some comments in the code base where it is clear this is a known trade-off rather than an overlook.

I am raising this issue because Zeitwerk depends on require semantics in the sense that require checks whatever $LOADED_FEATURES has at the moment. It needs that behavior because that is the trick it uses to have new autoloads load the same file again on reload, Zeitwerk removes the files from $LOADED_FEATURES.

I don't know bootsnap enough to think about how we could make both things play well together, but something that comes to mind is that if the cache has the file, then just call require with the corresponding absolute path. That is a trade-off, there is an extra $LOADED_FEATURES lookup, but it is complaint.

What do you think?

fxn commented 5 years ago

Also, we have to think that bootsnap does not own require, there might be other decorations upwards. Zeitwerk for example also decorates require. I believe it does so after bootsnap, but other code may have done it before.

fxn commented 5 years ago

Another idea that could be perhaps related, is that perhaps bootsnap could ignore calls with absolute paths. Zeitwerk internally works only with absolute paths.

Not exactly a proposal, you know better the principles behind bootsnap, just an idea that came to mind while looking into this.

fxn commented 5 years ago

Well, to be exact, Zeitwerk sets autoloads using only absolute paths, and preloads/eager loads using absolute paths. Projects managed by Zeitwerk do not even need to be in $LOAD_PATH, it just works if they are not by design.

But if require gets a relative path it still tries to play well with it in case client code of gems using Zeitwerk issues some.

That is undocumented, it is an attempt to do as much as possible to work seamlessly.

rafaelfranca commented 5 years ago

cc @burke

fxn commented 5 years ago

Just to throw another option, perhaps the integration of Zeitwerk in Rails could also invalidate the bootsnap cache.

Zeitwerk has reloading implemented, but it has to be triggered by client code. In the case of Rails, the framework is responsible for knowing when to reload, and when that happens it knows that it has to call Rails.autoloader.reload and do so in a thread-safe way. Similarly, that integration point could tell bootsnap to invalidate its cache. That could make sense, because files have changed, some may not even exist and a require call should err instead of returning false.

Probably, that invalidation should affect also the stable cache, because since engines are in autoload paths, they are reloaded too even if they are gems. I would like to change this in the future, but today that is the way it is.

As someone not familiar with bootsnap, I still believe that delegating to require with an absolute path to preserve semantics (ie, a file has been deleted, require has been decorated, etc.) is an option that sounds good.

Maybe even both things would be the complete solution, because if the file system has changed and it turns out a/foo.rb has been removed and at the same time b/foo.rb has been created and a and b are in the load paths, foo should resolve to a different absolute path.

In summary, a Rails programmer that changes the file system expects the changes to be available right away.

Only trying to give ideas and explain the uses cases. You guys are in a better position to decide which is the approach that squares better with bootsnap design and trade-offs.

burke commented 5 years ago

Hm, this is likely going to be tricky.

First point, we do some rather gross monkeypatching of the ActiveSupport autoloader (https://github.com/Shopify/bootsnap/blob/master/lib/bootsnap/load_path_cache/core_ext/active_support.rb). I'm not sure how much of this becomes irrelevant with, or conflicts with, Zeitwerk.

I'm not, in principle, opposed to making some minor performance compromises here to accommodate Zeitwerk.

It's really too bad that $LOADED_FEATURES is modified from C-land, so we can't simply hook all the cases where it's mutated like we're able to with $LOAD_PATH; that said, we should be able to hook whichever method Zeitwerk is using to mutate it, and probably respond accordingly to that manipulation.

burke commented 5 years ago

Looking at https://github.com/Shopify/bootsnap/blob/2390a1a51cc8922ded8ae7aab3f8cf39cdde5847/lib/bootsnap/load_path_cache/loaded_features_index.rb, I'm noticing that we only currently track e.g. foo out of /a/b/c/foo.rb, so it would be hard to respond correctly to a $LOADED_FEATURES.delete. We could plausibly correct this by doing something like:

            @lfi[short] = full_feature.hash
            @lfi[strip_extension(short)] = full_feature.hash

...then when deleting a feature, we could scan the index for suffixes of that path whose entry is the hash of that full path, pruning them as we go.

This would be a bit expensive, so we'd want to weigh how often this operation would happen versus the additional cost of maintaining a reverse index of the full feature path.

fxn commented 5 years ago

Hey @burke!

The integration is still WIP, but it could be the case that the Active Support monkey patches in bootsnap do not interfere. The way Zeitwerk is being integrated is basically by calling take_over from this module at some point in the initialization process. (Warning: I am force pushing to that branch.)

As you see, it is a quite lightweight decoration to preserve a subset of the interface that I have considered worth maintaining by now. The key observation that simplifies this integration is that Kernel#autoload is called before const_missing. Therefore, for a correct project structure, the autoloading code of AS::Dependencies won't ever run.

It could be the case that I unhook also AS::Dependeces in take_over, just to play safe. And it could be the case that instead of bootsnap/setup we generate a setup that disables bootsnap's monkey patch (from memory I believe that happens if you disable auto_load_paths). All is speculative right now.

But bootsnap worked as is in a simple application, the only think that didn't work was reloading, for the reasons we are studying here. At least by now, maybe other things may be waiting to be discovered, but by now that is what I've seen.

Zeitwerk only calls $LOADED_FEATURES.delete on reload. A reload is triggered by Rails when code changes, or when you call reload! in the console, for example. You know. Anything that invokes AS::Dependencies.clear. Maybe that helps leaning on a particular approach?

kaspth commented 5 years ago

Maybe Zeitwerk could broadcast when reloading happens and it thus mutates $LOADED_FEATURES or similar like that? Instead of preying on the mutation.

fxn commented 5 years ago

Well, Zeitwerk itself is agnostic in that sense, it just offers reload in case client code is a serivce that monitors the file system and wants to refresh the code.

But the Rails integration could call once a method that invalidates bootsnap cache, for example:

def clear
  Dependencies.unload_interlock do
    Rails.autoloader.reload
    Bootsnap.rescan
  end
end

That is what I meant above by saying

Similarly, that integration point could tell bootsnap to invalidate its cache.

I have to say that as far as require is concerned bootsnap is not necessary for projects managed by Zeitwerk, because the library directly issues require calls with absolute paths scanning once. But if Rails ships both of them, at least they should be able to coexist.

fxn commented 5 years ago

Regarding the last paragraph, if bootsnap had a way to say "do not cache anywthing volatile, only stable", that could also be a possibility for Rails integration maybe.

fxn commented 5 years ago

Let me summarize the different options so far and what I believe are the cons of each one:

  1. Provide API to rescan on file system changes: Costly and with little benefit. Zeitwerk carefully lazy scans the app tree precisely thinking about the cost of a tree traversal in Shopify's code base (discussed with @rafaelfranca). A rescan from bootsnap would defeat that optimization. Additionally, it would do so with little benefit, since code managed by Zeitwerk does not issue require calls with relative paths and bootsnap cache for relative paths won't be leveraged.

  2. Let require call the parent implementation with the cached absolute path instead of returning a hard-coded false for loaded files: The cache may be stale.

  3. Provide API to ignore so-called volatile paths: Since bootsnap would only scan a subset of $LOAD_PATH, it could resolve to a wrong filename if the relative path should resolve to something outside the stable paths due to a (maybe deliberate) name coincidence.

  4. Ignore require calls with absolute paths totally, do not cache, always pass them up: The user may exceptionally call require or require_dependency for whatever reason with a relative path, and we are at square one.

burke commented 5 years ago

~1.4.0.pre2~ 1.4.0.pre3 is pushed to rubygems, should work for you I think.

fxn commented 5 years ago

I believe this is working now, let's close.