esl / gradient

Gradient is a static typechecker for Elixir
Apache License 2.0
435 stars 13 forks source link

`The spec ... doesn't match the function name/arity` for generated code #150

Open japhib opened 1 year ago

japhib commented 1 year ago

If you generate a simple Phoenix app with mix phx.new, add Gradient as a dependency, and then run mix gradient, you get these errors (among others). For an example, see this PR: https://github.com/esl/gradient/pull/149

lib/simple_phoenix_app/repo.ex: The spec rollback/1 on line 2 doesn't match the function name/arity
lib/simple_phoenix_app_web/views/layout_view.ex: The spec template_not_found/2 on line 2 doesn't match the function name/arity
lib/simple_phoenix_app_web/views/page_view.ex: The spec template_not_found/2 on line 2 doesn't match the function name/arity

The spec referred to is generated at compile time by a use statement. Using lib/simple_phoenix_app_web/views/layout_view.ex as an example, here's the code in that actual file:

# lib/simple_phoenix_app_web/views/layout_view.ex:1-2
defmodule SimplePhoenixAppWeb.LayoutView do
  use SimplePhoenixAppWeb, :view
...

And here's the code that use SimplePhoenixAppWeb, :view generates:

# deps/phoenix_view/lib/phoenix_view.ex:209-223
  quote do
...
      @spec template_not_found(binary, map) :: no_return
      def template_not_found(template, assigns) do
        Phoenix.View.__not_found__!(__MODULE__, template, assigns)
      end
...

In the quoted code, the spec correctly comes right before the function. But when I decompile the LayoutView module (via Gradient.Debug.print_erlang(SimplePhoenixAppWeb.LayoutView)), the spec is placed at the beginning of the file, before the __info__ function definition. I believe that's why it gives the error.

I think the way to fix this would be to get the Elixir AST separately from the compiled bytecode so we could check if the function is generated, and in that case, allow the specs to be out of order.

luk-pau-es commented 1 year ago

After investigation: The code generated by Elixir compiler doesn't have generated: true attribute hence it produces error, I am adding example from repo.ex file for further investigation

{:function, 2, :rollback, 1,
    [
        {:clause, 2, [{:var, 2, :_@1}], [],
            [
                {:call, 2,
                {:remote, 2, {:atom, 2, Ecto.Repo.Transaction}, {:atom, 2, :rollback}},
                [{:call, 2, {:atom, 2, :get_dynamic_repo}, []}, {:var, 2, :_@1}]}
        ]}
    ]},
japhib commented 1 year ago

Just added https://github.com/esl/gradient/pull/162 with a simplified test that isolates the problem.

Looking a bit deeper into this, here is what I have found:

Gradient.AstSpecifier.run_mappers is the entry point into the part of the code where we determine whether code is generated or not. It seems like this decision is delegated to :erl_anno.generated/1 in a couple spots.

(However, this might not be right ... if I replace calls to :erl_anno.generated/1 with the hard-coded value true, the test still fails. Maybe we're lacking a check later on to see if a warning comes from generated code, and if so, ignore it?)

It seems like in the current implementation, we don't have sufficient information to determine whether code is generated or not. We just use the Erlang AST from :beam_lib.chunks(path, [:abstract_code]), and in that representation of the AST, a function defined by a use statement looks exactly the same as a function defined explicitly. Here's the code I'm testing with:

defmodule GeneratingCode do
  defmacro __using__(_) do
    quote do
      @spec wrong_ret() :: atom()
      def wrong_ret() do
        1234
      end
    end
  end
end

defmodule GeneratedCode do
  use GeneratingCode

  @spec wrong_ret2() :: atom()
  def wrong_ret2() do
    1234
  end
end

And here's what the AST looks like to Gradient:

iex(1)> Gradient.ElixirFileUtils.get_forms("test/examples/_build/Elixir.GeneratedCode.beam")
{:ok,
 [
   [

    ... (attributes and __info__ function) ...

     {:function, 13, :wrong_ret, 0, [{:clause, 13, [], [], [{:integer, 13, 1234}]}]},
     {:function, 16, :wrong_ret2, 0, [{:clause, 16, [], [], [{:integer, 16, 1234}]}]}
   ]
 ]}

As you can see above, the function wrong_ret created by the line use GeneratingCode, and the function wrong_ret2 declared explicitly with def wrong_ret2() look exactly the same to Gradient.

Inside AstSpecifier.run_mappers, we do have access to the tokens in the file. So one thing we could do is look at the tokens for the line declaring the function, and if it's a use statement, just hard-code that case to be generated. But, that only covers the use macro. Ideally, the fix here would be able to tell in all cases whether a macro is being used or not.

I believe the best solution is to change the way that we're getting the AST to include the Elixir debug info as well. The above AST, fetched with ElixirFileUtils.get_forms(), under the hood uses this function call to get the AST:

:beam_lib.chunks(path, [:abstract_code])

But if we change it to what's used in Gradient.Debug.elixir_ast/1, which is this:

:beam_lib.chunks(path, [:debug_info])

then we get a more interesting representation:

iex(2)> Gradient.Debug.elixir_ast('test/examples/_build/Elixir.GeneratedCode.beam')
{:ok,
 %{
...
   definitions: [
     {{:wrong_ret2, 0}, :def, [line: 16], [{[line: 16], [], [], 1234}]},
     {{:wrong_ret, 0}, :def, [line: 13, context: GeneratingCode],
      [{[line: 13, context: GeneratingCode], [], [], 1234}]}
   ],
...
 }}

Notice above that the wrong_ret function definition actually includes context: GeneratingCode, hinting to us that it is Elixir-compiler-generated code!

What do you think @luk-pau-es @erszcz? A couple of questions specifically:

  1. Am I on the right track with looking into the usages of :erl_anno.generated/1 inside Gradient.AstSpecifier.run_mappers? I'm not sure whether that's the right spot to try to mark code as generated, in order to suppress warnings for that code.
  2. Do you think using :beam_lib.chunks(path, [:debug_info]) to get the debug info in addition to the erlang AST is a good idea to determine whether code is generated or not? One concern might be performance, since we'd be fetching an extra AST for each file we examine.