Closed SimonWoolf closed 8 years ago
It is failing because it thinks the function Elixir.Inspect.List.inspect/2 is not available. I would look for the following things:
Hi Jose! Thanks for the quick reply :)
See if Elixir.Inspect.List is present inside Elixir in the slug (or check if it is available from the console)
I'm a little confused as to how it can not be, but you're right, it isn't. IO.puts :erlang.function_exported(Elixir.Inspect.List, :inspect, 2)
from the app prints true
when running the app locally, but false
in production.
Not sure how else to check if it's 'present inside Elixir'? Grepping the _build
directory for Elixir.Inspect.List
produces "Binary file Elixir.Inspect.beam matches", if that helps.
See if you are not accidentally redefining it
I'm not doing anything that redefines any Elixir builtins, so unless I'm missing something, I'm afraid I don't see how I could be. (And as I said, inspect
works fine when running the app locally, even with compiling and running with MIX_ENV=prod)
Maybe it is a failed build issue.
I've added always_rebuild=true
to the elixir_buildpack.config
, but the issue remains. (to be clear I'm not actually using heroku itself, just using the buildpack to build a slug, which is then run with mix run --no-halt
by docker).
I'm a little confused as to how it can not be, but you're right, it isn't. IO.puts :erlang.function_exported(Elixir.Inspect.List, :inspect, 2) from the app prints true when running the app locally, but false in production.
This seems to be the root cause. What does Code.ensure_loaded?(Elixir.Inspect.List)
return?
Not sure how else to check if it's 'present inside Elixir'?
Elixir.Inspect.List
is provided as part of Elixir. I am assuming somewhere in your slug, the Elixir project was downloaded and compiled. I am assuming that Elixir.Inspect.List.beam
is either missing from the elixir project (not _build) in your slug or it somehow has a bogus version.
This seems to be the root cause. What does Code.ensure_loaded?(Elixir.Inspect.List) return?
After some experimentation -- it seems to be random :(
Code.ensure_loaded?(Elixir.Inspect.List)
returns true
. In this case:
function_exported(Elixir.Inspect.List, :inspect, 2)
calls return true
inspect
calls work correctly (whether List, Map, whatever). (After a successful inspect
of a type, the relevant function_exported
calls start returning true)Code.ensure_loaded?(Elixir.Inspect.List)
returns false
. In this case:
function_exported(Elixir.Inspect.List, :inspect, 2)
calls continue to return false
inspect
calls fail (or if done by Logger, crash logger)Trying the non-boolean version: I obviously can't IO.inspect
the error tuple returned by Code.ensure_loaded
if it fails, but since all elements are atoms I can extract and IO.puts
them, and: the error returned is {:error, :nofile}
.
To be clear, this isn't just nondeterminism between builds, it's between deploys of a single build. I.e. I can deploy the exact same slug (same SHA) several times, and sometimes ensure_loaded will fail with :nofile
, and sometimes it'll work.
That's quite surprising.
I can deploy the exact same slug (same SHA) several times
When you mention "deploy" the same slug, does it compile the code every time you deploy or you are referring to the exact same compiled artifacts? Would you be able to retrieve the .beam
file for cases where it works and where it fails and upload somewhere so we can analyze them?
Thank you for following up.
When you mention "deploy" the same slug, does it compile the code every time you deploy or you are referring to the exact same compiled artifacts? Would you be able to retrieve the .beam file for cases where it works and where it fails and upload somewhere so we can analyze them?
Sorry for ambiguity - I mean literally the exact same compiled artifacts. heroku-elixir-buildpack spits out a tarball with everything compiled, that's what I'm referring to by the 'slug'. 'deploying the slug' is just starting a docker container with the slug extracted to /app, and running this script (passing in mix run --no-halt --no-deps-check
).
I've put the beam file from a slug up here in case it helps. Since it does sometimes work, I'm guessing everything is in there that should be else it couldn't ever work. :/
Thank you for following up.
No, thank you for helping! I realise this must be frustrating to debug remotely.
After some more experimentation, turns out this bug only ever happens if the very first call to inspect
for some datatype is done from a cowboy handler process.
In other words, the issue completely disappears if I do something like Logger.debug "Some data structures: #{inspect {%{}, [], MapSet.new, 0.1, 1..2, ~r//, :a, <<1>>}}"
from one of my apps' application callback functions (on start).
(I've also made sure it wasn't an issue with one of my other deps - I can reproduce the issue with a dummy app with no external deps except cowboy (1.0.0), and no other code except an app callback that starts cowboy and one http handler that does an IO.inspect).
The workaround solves the pressing issue for us, but of course I'm happy to keep trying any experiments you want me to to track the underlying bug down.
This is really helpful. It seems to be related to Erlang's code loading then. When building releases, we should not have such problems, as all modules are preloaded upfront. Great job and thank you!
I will leave this open for now if I get a chance to investigate it later, specially to find out why it happens only with inspect
.
We think we know why this was happening now, and it was our fault. Not a bug in elixir or erlang. Sorry for the invalid bug report :(
If you're interested:
Since the things in /tmp are overlayfs upper directories, this cleanup script didn't affect the underlying slug, which was still correct. But it did mark files as 'deleted' in overlayfs, so they were hidden from the docker container.
Result: random sadness, depending on a combination of when each .beam file was last modified, and which files had already been loaded by the erlang vm by the time the cleanup script ran. (So I guess my workaround only worked because it meant those files were loaded early on - and presumably most other protocols were either exercised early on or else not at all, so the problem only showed up with inspect
).
We've now moved the overlayfs upperdirs out of /tmp, and everything is working fine. (Why this hasn't shown up as an issue for us before, who knows..)
Thank you for your help earlier, and apologies again!
I am glad this was fixed! :heart:
Has this problem be reintroduced in 1.13.0 ? My application crashes on inspect in production
Wasn't sure whether to report this as not certain it's an elixir/logger bug rather than something I'm doing wrong (tried slack and SO first), but on I figure logger crashing seems likely to be a bug that I should report even if I am doing something odd. If it isn't a bug, many apologies.
So: the first
inspect
call that happens in production causes a logger crash, claiming that egElixir.Inspect.List.inspect
is undefined:The code that triggers it is just
Here,
:cowboy_req.qs_vals(request) |> elem(0)
is just[]
, so the line that crashes is equivalent toinspect([])
.Elixir version: 1.2.2 Erlang version: 18.2.1 Presumably this is related to Elixir 1.2 consolidating the inspect protocol by default? I'm using
build_embedded: Mix.env == :prod
, so protocols are being consolidated. (and eg_build/prod/consolidated/Elixir.Inspect.beam
is present in the slug).We're using the heroku elixir buildpack to make slugs, which are then run using docker.
The weird thing is, this crash only happens in production. The same app running locally -- even when using MIX_ENV=prod, running from the same slug that runs in production -- does not have any problems.
Let me know what other details I can provide to help.
Thanks in advance :)