elixir-lang / elixir

Elixir is a dynamic, functional language for building scalable and maintainable applications
https://elixir-lang.org/
Apache License 2.0
24.55k stars 3.38k forks source link

Macros may leak hygiene info into compiled modules #7947

Closed bitwalker closed 6 years ago

bitwalker commented 6 years ago

Environment

Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Elixir 1.6.6 (compiled with OTP 19)

Current behavior

To give you some background on how I discovered this, I'm building an extension library for ExUnit which provides helpers for running tests against a clustered application. I got most of the way done and was writing tests, when I encountered something strange: a subset of my tests worked fine, but when using my custom Case implementation, equivalent tests would fail with a :badfun error.

The way the scenario/3 macro works in the latter of those two examples, is that it will generate more or less something like the following:

@attribute_with_scenario_config scenario_config
describe "scenario description" do
  setup context do
    # initializes a cluster or gets the pid of the one already running for this scenario
    cluster = ...
    Map.put(context, :cluster, cluster)
  end

  unquote(block)
end

Initially I thought it was something in how I wrote this, but I did a sanity check by simply wrapping my working test module (the first of the two links above) in a describe, and running them again - this time the tests failed with :badfun!

The cause of the :badfun is that the module md5 is changing on every compilation. This is a problem for my use case in particular because :erl_boot_server and :erl_prim_loader require a file on disk in order to load code remotely, and there is no API for fetching the BEAM binary for modules compiled in memory so that I can use :code.load_binary/3 via RPC. Since test modules are compiled in memory, I don't have a way to send those modules to nodes in a cluster. Instead, my workaround for this, is that I load test module code onto the nodes in a cluster using more or less the same method that the test compiler uses to compile them for a normal test run, i.e., finding the set of files which match the test file pattern, and compiling them via Kernel.ParallelCompiler.require/2.

What I found, by inspecting the output of :beam_lib.chunks(module_bin, [:compile_info]), is that the md5 of the module is different on every compile. By then looking at the :abstract_code chunk, I found that by adding describe to my cluster_test.exs module (first link above), a counter was added in three places:

⟩ diff -u abstract_code{1,2}.erl
--- abstract_code1.erl  2018-07-20 14:25:39.000000000 -0400
+++ abstract_code2.erl  2018-07-20 14:25:39.000000000 -0400
@@ -204,7 +204,7 @@
                           {cons,0,
                            {tuple,0,
                             [{atom,0,counter},
-                             {integer,0,-576460752303372252}]},
+                             {integer,0,-576460752303398107}]},
                            {nil,0}}},
                          {cons,0,
                           {atom,0,'ExUnit'},
@@ -355,7 +355,7 @@
                           {cons,0,
                            {tuple,0,
                             [{atom,0,counter},
-                             {integer,0,-576460752303372252}]},
+                             {integer,0,-576460752303398107}]},
                            {nil,0}}},
                          {cons,0,
                           {atom,0,'ExUnit'},
@@ -400,7 +400,7 @@
                                      {cons,0,
                                       {tuple,0,
                                        [{atom,0,counter},
-                                        {integer,0,-576460752303372252}]},
+                                        {integer,0,-576460752303398107}]},
                                       {nil,0}}},
                                     {cons,0,{atom,0,'Node'},{nil,0}}]},
                                   {cons,0,{atom,0,self},{nil,0}}}]},

Since the value of this counter is different on every compilation, the md5 changes as well.

I brought this up with José, and his opinion was that likely some macro env info (namely the counter for hygiene) is being leaked into the compiled module. From what I can recall when looking at the abstract code, there were other bits of macro env in there as well (like line numbers and aliases), but the counter is the one which causes the issue.

Expected behavior

Compiling a module with the same contents, in the same compilation environment (e.g. there isn't code in the module deliberately generating dynamic content each time) more than once should produce the same BEAM bytecode, most importantly, the module md5 should not change.

josevalim commented 6 years ago

Thanks @bitwalker. I need a bit more information to reproduce this, as the code below always has the same version:

ExUnit.start

{_, _, binary, _} = defmodule Foo do
  use ExUnit.Case

  describe "omg" do
    test "sample" do
      assert :ok
    end
  end
end

:beam_lib.chunks(binary, [:attributes]) |> IO.inspect

Thoughts?

bitwalker commented 6 years ago

Change it to:

defmodule Foo do
  use ExUnit.Case

  alias Node, as: N

  describe "omg" do
    test "sample" do
      assert Node.self == N.self
    end
  end
end
bitwalker commented 6 years ago

To be clear, here's how I'm testing:

  1. Run iex
  2. Application.ensure_all_started(:ex_unit) (though ExUnit.start works too)
  3. Run the following:
[{_, bin1}] = "test.exs" |> File.read! |> Code.compile_string();
[{_, bin2}] = "test.exs" |> File.read! |> Code.compile_string();
c1 = :beam_lib.chunks(bin1, [:abstract_code]);
c2 = :beam_lib.chunks(bin2, [:abstract_code]);
File.write!("abstract1.erl", :io_lib.format('~p\n', [c1]));
File.write!("abstract2.erl", :io_lib.format('~p\n', [c2]))

Then run diff -u abstract{1,2}.erl to see the unified diff. This approach is a little more verbose than dumping the attributes or compile info, but shows the actual difference being factored into the md5 generation.

bitwalker commented 6 years ago

I found one without the alias definition that works as well:

defmodule Foo do
  use ExUnit.Case

  describe "omg" do
    test "sample" do
      assert [] = Enum.to_list([])
    end
  end
end
bitwalker commented 6 years ago

@josevalim I think I've got it pinned down: using the example above, if you change Enum to :'Elixir.Enum', the alias metadata doesn't end up in the compiled code (which contains the counter). It seems like the use of any module atom results in alias metadata generated, plain atoms, e.g. :foo or :'Elixir.Enum' do not.

josevalim commented 6 years ago

So the bug is in escape_quoted/2 inside ExUnit.Assertions. It escapes the AST but since the AST has dynamic counters in it, it ends-up changing the overall module AST.

We need to have a version of escape that also skips counters.

bitwalker commented 6 years ago

Maybe another clause of :elixir_quote.escape which takes a proplist/keyword list of options, allowing you to toggle the booleans in the :elixir_quote record, such as aliases_hygiene? Seems like that's the one we need based on this clause, assuming I'm reading it correctly.

josevalim commented 6 years ago

@bitwalker yes, precisely.

josevalim commented 6 years ago

@bitwalker I am on this btw. :)

bitwalker commented 6 years ago

Sounds good :) - I'm happy to do some testing once your fix is in master or on a branch I can checkout, so I'll keep an eye out here and do that as soon as it's available!

josevalim commented 6 years ago

Closing in favor of #7949 .