fxn / zeitwerk

Efficient and thread-safe code loader for Ruby
MIT License
1.99k stars 118 forks source link

[Question] Accessing attributes marked `internal` #259

Closed martin-brennan closed 1 year ago

martin-brennan commented 1 year ago

Hi,

I'm an engineer at Discourse, and we use Zeitwork in the core discourse/discourse repo. We've got a rails initializer, part of which looks like this, to watch for files that are not autoloaded and to bounce the running development server when they are changed:

Listen
  .to(*paths, only: /\.rb$/) do |modified, added, removed|
    supervisor_pid = UNICORN_DEV_SUPERVISOR_PID
    auto_restart = supervisor_pid && ENV["AUTO_RESTART"] != "0"

    files = modified + added + removed

    not_autoloaded =
      files.filter_map do |file|
        autoloaded = Rails.autoloaders.main.autoloads.key? file
        Pathname.new(file).relative_path_from(Rails.root) if !autoloaded
      end

    if not_autoloaded.length > 0
      message =
        (
          if auto_restart
            "Restarting server..."
          else
            "Server restart required. Automate this by setting AUTO_RESTART=1."
          end
        )
      STDERR.puts "[DEV]: Edited files which are not autoloaded. #{message}"
      STDERR.puts not_autoloaded.map { |path| "- #{path}".indent(7) }.join("\n")
      Process.kill("USR2", supervisor_pid) if auto_restart
    end
  end
  .start

The relevant line to this question is here:

autoloaded = Rails.autoloaders.main.autoloads.key? file

This line started erroring in the last couple of days, since we merged this DependaBot PR https://github.com/discourse/discourse/pull/20253 which contained the following commit:

https://github.com/fxn/zeitwerk/commit/ad6c0284849b96198df9175bf367ea83a398e6c3

The error goes something like this:

NoMethodError: private method `autoloads' called for #<Zeitwerk::Loader:0x000000010ea7d3c0
....
          autoloaded = Rails.autoloaders.main.autoloads.key? file
                                             ^^^^^^^^^^
Did you mean?  __autoloads

I had a look into this and it seems like the easiest way to fix this is to just refer to the "mangled" version of autoloads from the Internal module, which is __autoloads.

My question is -- should we be doing this? Is there a better way we can use to check if a file has been autoloaded in this hash? Or is purposefully using this internal attribute fine?

Rails.autoloaders.main.__autoloads
=> {"/Users/mb/repos/discourse/lib/admin_confirmation.rb"=>[Object, :AdminConfirmation],
 "/Users/mb/repos/discourse/lib/admin_user_index_query.rb"=>[Object, :AdminUserIndexQuery],
 "/Users/mb/repos/discourse/lib/age_words.rb"=>[Object, :AgeWords],
 "/Users/mb/repos/discourse/lib/auth/current_user_provider.rb"=>[Auth, :CurrentUserProvider],
 "/Users/mb/repos/discourse/lib/auth/default_current_user_provider.rb"=>[Auth, :DefaultCurrentUserProvider],
 "/Users/mb/repos/discourse/lib/autospec"=>[Object, :Autospec],
 "/Users/mb/repos/discourse/lib/backup_restore.rb"=>[Object, :BackupRestore],

Thanks!

fxn commented 1 year ago

Hey Martin!

I had a look into this and it seems like the easiest way to fix this is to just refer to the "mangled" version of autoloads from the Internal module, which is __autoloads.

Absolutely not. Internal attributes are internal, that's the point, they do not belong to the public interface. They are not Ruby-private because Zeitwerk accesses them internally. The mangled version is there for Zeitwerk and the leading double scores (besides lack of documentation) try to scream that should not be found in client code.

Reason is, Zeitwerk does not expose any of its internal state, becasue that is how it can assume things about it. Also, the way internal state is managed is arbitrary and subject to optimizations and whatever trick the internal logic can rely on.

While you only check for files, an interface in Zeitwerk for that would also need to deal with implicit namespaces, which can be spread in multiple directories.

I saw you patched with __autoloads. It's OK to workaround, and opening this issue was the way to go, because knowing that there is a use case for private interface is valuable feedback for me.

Let me think about it.

fxn commented 1 year ago

Indeed, if I understand that initializer correcttly, it is not right.

Zeitwerk descends lazily. So, if you touch app/models/foo/bar/baz.rb but Foo::Bar has not been used, the file is not in that hash, and you'd restart the server on auto start, which is not the intention of the code, as I undesrtand it.

The purpose of that hash is to let the loader keep track of the autoloads that have been set and not executed. It is a transient collection not meant to be used externally. Its meaning and maintenance logic are not public.

davidtaylorhq commented 1 year ago

keep track of the autoloads that have been set and not executed

For files that have been 'executed', Rails/Zeitwerk will unload them when they're modified (i.e. put them back into the autoloads hash). That happens immediately before our custom logic. So, at this point in time, "is filename in autoloads?" is a pretty good analogue for "can this file be autoloaded?".

Zeitwerk descends lazily

That is good to know, thanks - we must be making some unnecessary server restarts in some cases. I suppose we don't hit it too often, since the majority of things in our rails app are top-level constants (e.g. controllers, models, etc.).

Ideally, we would have an API like is_file_already_loaded_without_autoloading?. Those are the changes which truly require a hard restart.


Edited-in response from @fxn:

keep track of the autoloads that have been set and not executed

For files that have been 'executed', Rails/Zeitwerk will unload them when they're modified (i.e. put them back into the __autoloads hash). That happens immediately before our custom logic. So, at this point in time, "is filename in __autoloads?" is a pretty good analogue for "can this file be autoloaded?".

No, this is incorrect.

Ideally, we would have an API like is_file_already_loaded_without_autoloading?. Those are the changes which truly require a hard restart.

My plan is to implement Zeitwerk::Loader#manages?, which will allow your code to select the files outside of the autoload paths, and with edge cases covered like honouring ignored files.

fxn commented 1 year ago

Actually, not sure about that predicate. Going to followup in Discourse's repo (here).

fxn commented 1 year ago

I have been thinking about that use case for a few days, and by now the resolution is that it is not something in line with the design of the library. I believe you can still accomplish the logic you want, but it won't be via Zeitwerk API.

I'll explain in the ticket linked above.