bamorim / typed_ecto_schema

A library to define Ecto schemas with typespecs without all the boilerplate code.
https://hexdocs.pm/typed_ecto_schema
Apache License 2.0
271 stars 18 forks source link

typed_schema introduces cyclic compile-time dependencies #6

Closed randycoulman closed 2 years ago

randycoulman commented 4 years ago

I really love the convenience of this library. Thank you for making it!

We recently started using typed_schema in our project but have noticed that we get a lot of spurious recompilation happening. For example, we can compile our code and start our Phoenix server. Then, without making any code changes, every request to the server will trigger the recompilation of a number of files.

I ran mix xref graph --format cycles --label compile and found that there were mutual compile dependencies between any of our schema modules that were associated with each other. If I modify one of those schemas to use schema and a hand-written typespec, I do not see those same dependencies.

I am using Elixir 1.10.0 here, but I also tested with 1.11.0 and got the same results. I guess that means that the dependency improvements in 1.11 don't help with this situation.

I was not able to figure out what typed_schema is doing that causes this dependency cycle. I don't want to have to start writing typespecs by hand, so I'm hoping there's a workable solution.

If you have an idea of what I could try, I'd be happy to work on a PR, but I don't know enough to do that yet.

I managed to reproduce the problem in a standalone project. I'm happy to push up the repo if you like, but here are the basics:

With normal schemas:

# Person.ex
defmodule Person do
  use Ecto.Schema

  @enforce_keys [:name]

  schema "people" do
    field(:name, :string)
    field(:age, :integer)
    field(:happy, :boolean, default: true)
    field(:phone, :string)
    belongs_to(:company, Company)
    timestamps(type: :naive_datetime_usec)
  end

  @type t() :: %__MODULE__{
          __meta__: Ecto.Schema.Metadata.t(),
          id: integer() | nil,
          name: String.t(),
          age: non_neg_integer() | nil,
          happy: boolean(),
          phone: String.t() | nil,
          company_id: integer() | nil,
          company: Company.t() | Ecto.Association.NotLoaded.t() | nil,
          inserted_at: NaiveDateTime.t(),
          updated_at: NaiveDateTime.t()
        }
end

# Company.ex
defmodule Company do
  use Ecto.Schema

  @enforce_keys [:name]

  schema "company" do
    field(:name, :string)
    has_one(:ceo, Person)
    timestamps(type: :naive_datetime_usec)
  end

  @type t() :: %__MODULE__{
          __meta__: Ecto.Schema.Metadata.t(),
          id: integer() | nil,
          name: String.t(),
          ceo: Person.t() | Ecto.Association.NotLoaded.t() | nil,
          inserted_at: NaiveDateTime.t(),
          updated_at: NaiveDateTime.t()
        }
end
$ mix xref graph --format cycles --label compile
lib/company.ex
lib/person.ex
lib/typed_schema_cycle.ex

With typed_ecto_schema:

# Person.ex
defmodule Person do
  use TypedEctoSchema

  typed_schema "people" do
    field(:name, :string, enforce: true, null: false)
    field(:age, :integer) :: non_neg_integer() | nil
    field(:happy, :boolean, default: true, null: false)
    field(:phone, :string)
    belongs_to(:company, Company)
    timestamps(type: :naive_datetime_usec)
  end
end

# Company.ex
defmodule Company do
  use TypedEctoSchema

  typed_schema "company" do
    field(:name, :string, enforce: true, null: false)
    has_one(:ceo, Person)
    timestamps(type: :naive_datetime_usec)
  end
end
$ mix xref graph --format cycles --label compile
lib/company.ex
└── lib/person.ex (compile)
    └── lib/company.ex (compile)
lib/person.ex
lib/typed_schema_cycle.ex
randycoulman commented 3 years ago

I did some digging and experimenting on this issue today and I believe I have found the issue. I'm not sure how to fix it or if it is even fixable.

It turns out that, if you reference another module at all within the top level of a module definition, Elixir will add a compile-time dependency on that module. For example, if I have two files:

# a.ex
defmodule A do
end

#b.ex
defmodule B do
  A
end

Elixir will report that B has a compile-time dependency on A. This may be an Elixir bug, but there may also be a really good reason for it.

I opened https://github.com/elixir-lang/elixir/issues/10565 to find out for sure.

How that applies here is that this library ends up expanding macros that result in a call that looks something like TypeBuilder.add_field(__MODULE__, :belongs_to, :company, Company, []) at the top-level of the module. Since there is now a reference to Company at the top level, Elixir adds a compile-time dependency on Company which ultimately ends up causing the circular dependency issue here.

Is there a way to achieve the goals of this library without causing this dependency issue? Can we somehow accumulate all of the necessary information in module attributes without a top-level function call and then generate the typespec we need?

randycoulman commented 3 years ago

According to the response I got to elixir-lang/elixir#10565, it seems that the compile-time dependency is required and not a bug.

I did a bit more experimenting. I modified one of my files to look like this so I could see the macro-expanded code:

defmodule Company do
  use TypedEctoSchema

  code =
    quote do
      typed_schema "company" do
        field(:name, :string, enforce: true, null: false)
        has_one(:ceo, Person)
        timestamps(type: :naive_datetime_usec)
      end
    end

  Macro.expand(code, __ENV__) |> Macro.to_string() |> IO.puts()
end

The resulting output looks like this:

(
  (
    require(TypedEctoSchema.TypeBuilder)
    TypedEctoSchema.TypeBuilder.init([])
  )
  TypedEctoSchema.TypeBuilder.add_meta(__MODULE__)
  Ecto.Schema.schema("company") do
    TypedEctoSchema.TypeBuilder.add_primary_key(__MODULE__)
    (
      (
        field(:name, :string, null: false)
        TypedEctoSchema.TypeBuilder.add_field(__MODULE__, :field, :name, :string, enforce: true, null: false)
      )
      (
        has_one(:ceo, Person)
        TypedEctoSchema.TypeBuilder.add_field(__MODULE__, :has_one, :ceo, Person, [])
      )
      (
        timestamps(type: :naive_datetime_usec)
        TypedEctoSchema.TypeBuilder.add_timestamps(__MODULE__, Keyword.merge(@timestamps_opts, type: :naive_datetime_usec))
      )
    )
    TypedEctoSchema.TypeBuilder.enforce_keys()
  end
  TypedEctoSchema.TypeBuilder.define_type([])
)

If I'm interpreting that correctly, it looks like the macro expansion results in the calls to TypeBuilder functions appearing at the top level of the module, which triggers the compile-time dependency that I'm seeing.

I'm not sure this makes sense, but is there a way to move those calls to macro-expansion time so that the resulting AST looks like what it would look like if the code was written by hand?

bamorim commented 3 years ago

Hey @randycoulman sorry for not being responsive. I'm just writing this message to let you know that I'll try to take a look at this issue and thanks for the message. I'll read your message thoroughly and will get back to you as soon as I have enough to write a decent response.

dvic commented 3 years ago

Hi @bamorim, I've read this issue and I was wondering exactly the same thing lately. Do you have any plans for this? Otherwise I might want to take a stab at it, but I feel it will take me a while because it feels the whole approach to the code generation needs to be changed to avoid compile time dependencies? 😅

bamorim commented 3 years ago

@dvic I'm afraid you are correct. I started taking a look at this back then and it looked like it. I still didn't had the time to go deep and find better solutions.

If you want to try it out it would be much appreciated.

A "hacky" thing would be to convert the module names to string and then convert them back to atoms afterwards when actually generating the code, but that seems nasty.

Another option (which I think is best) would be to instead adding information to the module on all calls to field/timestamps/etc, would be to analyze the AST in the end and generate all fields at once without putting anything in the module - level.

But I still have to investigate it further.

dvic commented 3 years ago

Thanks for the quick response! I was thinking to do the latter (analyzing the AST and generating the fields at once) but it might be more work, especially if many tests fail (I don't remember to which degree the tests are coupled to the implementation). If it's more work then the "hack" might be a solution, I'll see what I can do :)

dvic commented 3 years ago

I have cooked up something, I'll push a PR tonight. All tests are passing (and I added a few for resolving macros inside the do/end block) but it still has compile deps 😂

Example:

defmodule Foo do
  use TypedEctoSchema
  typed_schema "foo" do
    field :name, :string
    has_one :bar, Bar
  end
end

Produced AST:

➜ mix xref graph --source lib/foo.ex
Compiling 1 file (.ex)
# the following is the ast returned by typed_schema
[
  {{:., [], [{:__aliases__, [alias: false], [:Ecto, :Schema]}, :schema]}, [],
   [
     "foo",
     [
       do: {:__block__, [],
        [{:field, [], [:name, :string]}, {:has_one, [], [:bar, Bar]}]}
     ]
   ]},
  {:@, [context: TypedEctoSchema.TypeBuilder, import: Kernel],
   [
     {:type, [context: TypedEctoSchema.TypeBuilder],
      [
        {:"::", [],
         [
           {:t, [], []},
           {:%, [],
            [
              {:__MODULE__, [], TypedEctoSchema.TypeBuilder},
              {:%{}, [],
               [
                 __meta__: {{:., [], [Ecto.Schema.Metadata, :t]}, [], []},
                 id: {:|, [], [{:integer, [], []}, nil]},
                 name: {:|, [], [{{:., [], [String, :t]}, [], []}, nil]},
                 bar: {:|, [],
                  [
                    {{:., [], [Ecto.Schema, :has_one]}, [],
                     [{{:., [], [Bar, :t]}, [], []}]},
                    nil
                  ]}
               ]}
            ]}
         ]}
      ]}
   ]}
]
lib/foo.ex
└── lib/bar.ex (compile)

Replacing those two instances of Bar with {:__aliases__, [alias: false], [:Bar]} does not help. What am I missing here 🤔?

dvic commented 3 years ago

By the way, currently the code relies on before_compile, maybe that's the problem? So the direct AST of typed_schema is actually:

Module.__put_attribute__(__MODULE__, :before_compile, TypedEctoSchema.TypeBuilder, nil)

The reason for this was (if I remember correctly 😅) that I needed to access module attributes like @timestamps_opts and primary_key, but we can discuss internals once I push the PR.

dvic commented 3 years ago

update: I found out why this is happening, it's because I prewalk the ast and call Macro.expand :) I know how I can fix this, will keep you posted!

dvic commented 3 years ago

PR is submitted :)

bamorim commented 2 years ago

Sorry for thanking that long and many thanks to @dvic for fixing it. It was merged and I'll be releasing a new version soon.