In Rails, `app/jobs/seeder/` directory breaks `db:seed` task

IngoTomahogh commented 3 months ago

Adding any file(s) app/jobs/seeder/*.rb (with or without content, doesn’t matter) in Rails will confuse Zeitwerk into assuming that there must be a module Seeder, breaking any subsequent class Seeder; end statements wherein any custom seeder code would be defined.

If such files exist, running rails db:seed --trace will lead to this error message:

** Invoke db:seed (first_time)
** Invoke db:load_config (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute db:load_config
** Execute db:seed
** Invoke db:abort_if_pending_migrations (first_time)
** Invoke db:load_config 
** Execute db:abort_if_pending_migrations
D, [2024-07-18T14:31:55.673313 #2865] DEBUG -- :    (1.2ms)  SELECT `schema_migrations`.`version` FROM `schema_migrations` ORDER BY `schema_migrations`.`version` ASC
rails aborted!
TypeError: Seeder is not a class (TypeError)
/bundler/ruby/3.3.0/gems/zeitwerk-2.6.16/lib/zeitwerk/cref.rb:85: previous definition of Seeder was here

(Loaded from /app/db/seeder.rb, /app/db/seeds.rb, /bundler/ruby/3.3.0/gems/railties-6.1.7.8/lib/rails/engine.rb, etc.)

Versions used:

Zeitwerk 2.6.16
Rails 6.1.7.8
Ruby 3.3.1

fxn commented 3 months ago

Hi. Where is the Seeder class defined?

IngoTomahogh commented 3 months ago

Hi. Where is the Seeder class defined?

We define it in db/seeder.rb, loaded via db/seeds.rb — which, as I now realize, might be part of the issue, because it’s not in the standard load path (and not defined by Rails, either).

Edit: The file in app/jobs/seeder/ also contained a class Seeder statement (in an attempt to avoid such issues), along the lines of


class Seeder
  class WriteLotsOfDataJob < ApplicationJob
    def perform = DateWriter.run
  end
end

fxn commented 3 months ago

Got it, before I answer let me ask a couple more questions. Do you plan to use those files except for seeding? Why are they in app/jobs?

IngoTomahogh commented 3 months ago

Got it, before I answer let me ask a couple more questions. Do you plan to use those files except for seeding? Why are they in app/jobs?

It’s just one job that will be scheduled multiple times to create a large amount of data. We might also decide to invoke it from a custom rake task.

An error message pointing me to that directory (instead of the place in the Zeitwerk code) would have helped already, but creating the module without loading the Ruby files is confusing anyway :) Yes, it’s a fringe case, but that makes it all the more confusing.

fxn commented 3 months ago

Got it.

First, let me explain that Zeitwerk expectations are entirely based on file name conventions, it does not parse file contents. If Seeder is not a defined constant and there is no seeder.rb file managed by the autoloader, by convention that is considered to be an implicit namespace. So, Zeitwerk creates a dummy module called Seeder when needed.

So, Zeitwerk crates a module called Seeder and when a job is loaded, Ruby is the one raising an error. I could investigate if this error could be captured and transformed into something more useful.

This is a name clash, sometimes Seeder is defined as a side-effect of running db:seeds, sometimes it isn't. If defined on boot, since it is not managed by the autoloader, Zeitwerk considers you are reopening a 3rd-party namespace and all works.

If the Seeder class is not defined, the job is defining it as a side-effect.

Basically, the definition of Seeder is not deterministic in this application, sometimes it is a Seeder class with (I assume) stuff in it. Other times it wants to be a Seeder class without methods and with unrelated constants.

This situation is confusing, the ownership of the Seeder class and its definition are unclear. I believe you have three options:

Rename the seeder directory to clearly separate unrelated things.
Define the Seeder class during initialization, so that both seeds and jobs reopen it not matter what.
Define jobs using a constant path: class Seeder::WriteLotsOfDataJob < ApplicationJob, so that no matter if the namespace is a module or a class, the job is going to be loaded anyway.

IngoTomahogh commented 3 months ago

Thanks for the detailed explanation — Zeitwerk’s behavior does make sense, especially given that our Seeder class doesn’t have a seeder.rb file within Zeitwerk’s load path.

A more explanatory error message would be really neat, but I realize that it might be difficult to achieve.

As for solving my problem, loading the job never was a problem (as that would cause the file within the app/jobs/seeder/ directory to be loaded and the Seeder class to be defined as such).

In the end, I went with option 4 and actually made Seeder a module. — As it turned out, this piece of legacy code is just used as a namespace for a bunch of methods and modules (and now the job class), so it’s not a problem to give in to Zeitwerk’s conventions and make it a module 😄

Again, thank you for your rapid and helpful response!

fxn commented 3 months ago

Awesome, I'll think about the improved error message, glad you found a solution!

fxn commented 3 months ago

As for solving my problem, loading the job never was a problem (as that would cause the file within the app/jobs/seeder/ directory to be loaded and the Seeder class to be defined as such).

Let me add that this does not square.

If that Seeder class is (was) loaded only when db:seeds runs, and therefore is not defined when background jobs are being executed, say, Zeitwerk is going to create a module called Seeder on demand before any job file under app/job/seeder is loaded.

Therefore, when Ruby finds class Seeder, it says: "Cool, I already have a Seeder constant, let me grab the value it stores. I am in the context of a class keyword but got a module object, that is a TypeError".

If the jobs without the seeds were being loaded, I believe there is something else missing in our discussion.

fxn / zeitwerk

In Rails, `app/jobs/seeder/` directory breaks `db:seed` task #299