ExUnit.CaptureLog.remove_capture fails with error

samsondav commented 5 years ago

Environment

Erlang/OTP 21 [erts-10.0.7] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Elixir 1.7.3 (compiled with Erlang/OTP 21)

Current behavior

I changed this line:

ExUnit.start()

to

ExUnit.start(capture_log: true)

in order to silence log output during tests. However, some tests are now failing with this error message:

     ** (EXIT from #PID<0.1226.0>) exited in: ExUnit.CaptureLog.remove_capture(#PID<0.1241.0>)
         ** (EXIT) {:error, :module_not_found}

Any ideas?

josevalim commented 5 years ago

So I cannot reproduce this. Here is a snippet that defines 100 test modules with 100 tests each:

require Logger
ExUnit.start(capture_log: true)

for i <- 1..100 do
  defmodule Module.concat(["Test#{i}"]) do
    use ExUnit.Case, async: :rand.uniform > 0.5

    for i <- 1..100 do
      test :"#{i}" do
        Logger.debug "ok"
      end
    end
  end
end

It passes just fine.

Looking at the code, the only situation I could see this happening is if the Logger supervisor restarts, maybe because a custom backend is failing? Are you using any custom backends? Does removing them reflect the failures?

samsondav commented 5 years ago

We do not use any custom logger backends. Our setup looks like this:

config :logger, :console, metadata: :all
config :logger, :console, level: :error

NB: We do use a custom backend in prod mode, but not in test mode.

samsondav commented 5 years ago

@josevalim I'm not really sure what else I can give you. The failures are intermittent - sometimes a test will fail and sometimes it will pass. But it is always the same tests that flicker.

Is there anything else I can share with you that will help us to debug this?

josevalim commented 5 years ago

Maybe one option is for you to add a file logger backend and then see if the file backend has logged any useful information because the issue in cases like this is that something else may have been logged but we wouldn't know because there is no logger backend.

fertapric commented 5 years ago

I managed to reproduce it:

require Logger

Application.put_env(:logger, :console, [
  format: "\n$time $metadata[$level] $levelpad$message\n",
  metadata: :all
])

ExUnit.start(capture_log: true)

defmodule Module.CaptureLogError do
  use ExUnit.Case

  test "boom!" do
    Logger.metadata(boom: <<1::1>>)
    Logger.info "boom!"
  end
end

fertapric commented 5 years ago

@samphilipd it should not fail if you remove the metadata: :all config. The next step would be to figure out what's killing the capture log handler.

josevalim commented 5 years ago

Metadata has to be printable. But that shouldn’t kill the handler either. --

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

fertapric commented 5 years ago

Here is my hypothesis.

The StringIO device does not handle well those cases of non-printable metadata, raising an un-rescued ArgumentError when calling :unicode.characters_to_binary/3, and killing the device process:

https://github.com/elixir-lang/elixir/blob/00e488d28225c9b110e790f53078d321e0259495/lib/elixir/lib/string_io.ex#L274-L282

The handler monitors the device when logging, and if the device goes down, the handler raises:

https://github.com/elixir-lang/elixir/blob/00e488d28225c9b110e790f53078d321e0259495/lib/logger/lib/logger/backends/console.ex#L72-L74

which I believe kills the handler.

This doesn't happen with the :user device, which replies with an {:error, {:put_chars, :unicode, <bad data>}} if you provide non-printable chars.

I think everything should work if the ArgumentError is rescued in StringIO.

Thoughts?

josevalim commented 5 years ago

Awesome, you got it:

iex(2)> :io.put_chars <<1::1>>
** (ArgumentError) argument error
    (stdlib) :io.put_chars(#PID<0.63.0>, :unicode, <<1::size(1)>>)
iex(2)> {:ok, pid} = StringIO.open("")
{:ok, #PID<0.112.0>}
iex(3)> :io.put_chars pid, <<1::1>>
10:15:57.990 [error] GenServer #PID<0.112.0> terminating
** (ArgumentError) argument error
    (stdlib) :unicode.characters_to_binary(<<1::size(1)>>, :unicode, :unicode)
    (elixir) lib/string_io.ex:275: StringIO.put_chars/4
    (elixir) lib/string_io.ex:199: StringIO.io_request/4
    (elixir) lib/string_io.ex:177: StringIO.handle_info/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:io_request, #PID<0.105.0>, #Reference<0.656754156.1514405889.157709>, {:put_chars, :unicode, <<1::size(1)>>}}
State: %{capture_prompt: false, input: "", output: ""}
** (EXIT from #PID<0.105.0>) shell process exited with reason: an exception was raised:
    ** (ArgumentError) argument error
        (stdlib) :unicode.characters_to_binary(<<1::size(1)>>, :unicode, :unicode)
        (elixir) lib/string_io.ex:275: StringIO.put_chars/4
        (elixir) lib/string_io.ex:199: StringIO.io_request/4
        (elixir) lib/string_io.ex:177: StringIO.handle_info/2
        (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
        (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
        (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

And I know the default IO does rescue unicode conversion errors and return {:error, :badarg} from them, so you are correct regarding the fix too.

fertapric commented 5 years ago

The fix for non-printable data in Logger's metadata is fixed in master.

@samphilipd could you confirm that was the issue in your case?

josevalim commented 5 years ago

Closing as it has been fixed.

AndrewDryga commented 4 years ago

@josevalim I think we are facing a regression or similar bug:

Elixir 1.9.4 (compiled with Erlang/OTP 22)

0) ImaginatorWeb.TransformControllerTest: failure on setup_all callback, all tests have been invalidated
     ** (EXIT from #PID<0.763.0>) exited in: ExUnit.CaptureLog.remove_capture(#PID<0.764.0>)
         ** (EXIT) {:error, :module_not_found}

  0) ImaginatorWeb.InfoControllerTest: failure on setup_all callback, all tests have been invalidated
     ** (EXIT from #PID<0.777.0>) exited in: ExUnit.CaptureLog.remove_capture(#PID<0.778.0>)
         ** (EXIT) {:error, :module_not_found}

We are using unmodified generated ConnCase and do not add any setup_all callbacks.

Removing config :logger, :console, metadata: :all solves the issue but we can't do that as we have assertions on Logger metadata.

josevalim commented 4 years ago

Do you have any way to reproduce the error? You can try running the suite with SASL logs enabled, as they may tell the actual reason why the handler was removed. --

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

AndrewDryga commented 4 years ago

@josevalim So the issue is that we have two backends configured in our app, LoggerJSON which we use in production and :console which is used in test environment, test.exs:

config :logger,
  backends: [:console],
  level: :info

But Elixir :console logger backend still receives fragments encoded by LoggerJSON.Plug as an event:

ERROR REPORT==== 22-Dec-2019::18:57:05.495250 ===
** gen_event handler {'Elixir.Logger.Backends.Console',<0.973.0>} crashed.
** Was installed in 'Elixir.Logger'
** Last event was: {info,<0.62.0>,
                       {'Elixir.Logger',<<>>,
                           {{2019,12,22},{11,57,5,495}},
                           [{pid,<0.975.0>},
                            {node,
                                #{'__struct__' => 'Elixir.Jason.Fragment',
                                  encode =>
                                      #Fun<Elixir.LoggerJSON.Plug.MetadataFormatters.GoogleCloudLogger.1.58295238>}},
                            {httpRequest,
                                #{'__struct__' => 'Elixir.Jason.Fragment',
                                  encode =>
                                      #Fun<Elixir.LoggerJSON.Plug.MetadataFormatters.GoogleCloudLogger.0.58295238>}},

                            {phoenix,
                                #{'__struct__' => 'Elixir.Jason.Fragment',
                                  encode =>
                                      #Fun<Elixir.LoggerJSON.Plug.MetadataFormatters.GoogleCloudLogger.3.58295238>}},
                            {line,44},
                            {function,<<"call/2">>},
                            {module,'Elixir.LoggerJSON.Plug'},
                            {file,<<"lib/logger_json/plug.ex">>},
                            {application,logger_json},
                            {request_id,<<"FeKvibPMtoinvXMAAAiK">>}]}}
** When handler state == #{'__struct__' => 'Elixir.Logger.Backends.Console',
                           buffer => [],buffer_size => 0,
                           colors =>
                               #{debug => cyan,enabled => true,error => red,
                                 info => normal,warn => yellow},
                           device => <0.973.0>,
                           format =>
                               [<<"\n">>,time,<<" ">>,metadata,<<"[">>,level,
                                <<"] ">>,levelpad,message,<<"\n">>],
                           level => nil,max_buffer => 32,metadata => all,
                           output => nil,ref => nil}
** Reason == {#{'__exception__' => true,
                '__struct__' => 'Elixir.Protocol.UndefinedError',
                description => <<>>,protocol => 'Elixir.String.Chars',
                value =>
                    #{'__struct__' => 'Elixir.Jason.Fragment',
                      encode =>
                          #Fun<Elixir.LoggerJSON.Plug.MetadataFormatters.GoogleCloudLogger.1.58295238>}},
              [{'Elixir.String.Chars','impl_for!',1,
                   [{file,"lib/string/chars.ex"},{line,3}]},
               {'Elixir.String.Chars',to_string,1,
                   [{file,"lib/string/chars.ex"},{line,22}]},
               {'Elixir.Logger.Formatter','-output/5-fun-0-',1,
                   [{file,"lib/logger/formatter.ex"},{line,166}]},
               {'Elixir.Enum','-map/2-lists^map/1-0-',2,
                   [{file,"lib/enum.ex"},{line,1336}]},
               {'Elixir.Enum','-map/2-lists^map/1-0-',2,
                   [{file,"lib/enum.ex"},{line,1336}]},
               {'Elixir.Logger.Formatter','-format/5-fun-0-',6,
                   [{file,"lib/logger/formatter.ex"},{line,152}]},

So, in the end, it crashes because it can't encode Jason.Fragment prepared by plug.

josevalim commented 4 years ago

As far as I know if you have the LoggerJSON plug in your pipeline, it will always add Jason metadata. So you need to configure :metadata accordingly instead of :all.

AndrewDryga commented 4 years ago

That is true and it's purely application error. Thank you

datafoo commented 4 years ago

I am having the exact same problem with elixir 1.10.0.

$ elixir --version
Erlang/OTP 22 [erts-10.5.5] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1] [hipe]

Elixir 1.10.0 (compiled with Erlang/OTP 22)

I made a minimal project with the code above and here is what I get:

$ mix test

  0) Myapp.CaptureLogErrorTest: failure on setup_all callback, all tests have been invalidated
     ** (EXIT from #PID<0.150.0>) exited in: ExUnit.CaptureLog.remove_capture(#PID<0.151.0>)
         ** (EXIT) {:error, :module_not_found}

Finished in 0.05 seconds
0 failures

Randomized with seed 548524

Can you reproduce?

josevalim commented 4 years ago

Thanks, I was able to reproduce it. It is related to #9814 and it has been fixed on master and v1.10 branch. We will release v1.10.2 in a week with the fix.

maennchen commented 3 years ago

@josevalim Could there be another regression? We have the same issue with quantum. (I did not verify it myself yet.)

https://github.com/quantum-elixir/quantum-core/issues/466

elixir-lang / elixir

ExUnit.CaptureLog.remove_capture fails with error #8507

Environment

Current behavior