Closed AndrewDryga closed 7 years ago
The top-level application in a release will always be started as permanent, so this setting doesn't actually apply to releases. Was there a specific issue you encountered?
In some of our workers when connection pool is overloaded, GenServer (it is added to a application supervisor as worker) will fail due to pool connection timeout. This failure results in reached max restart intensity
error, then application supervisor should fail and kill erlang vm, but it lives with dead GenServer and application is not working.
@AndrewDryga Do you have an example app I can use to reproduce?
Unfortunately I don't have ways to reproduce it, but I have logs from production:
12:05:58.289 [info] Raise processing flag: 5. Batch: "74ff28ea9920dc6ddccbbb25ed823b69d99e8bca".
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1397.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1398.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1399.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1400.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.673 [error] Postgrex.Protocol (#PID<0.1401.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.673 [warn] Loan from batch 09c0d1688a1ed0bd9b46831ec210da4024190ab7 not found. Removing task.
12:05:38.874 [error] GenServer #PID<0.1325.0> terminating
** (stop) no process
(stdlib) proc_lib.erl:794: :proc_lib.stop/3
(ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
(ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
(stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:681: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:38 ===
crasher:
initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
pid: <0.1325.0>
registered_name: []
exception exit: noproc
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
messages: []
links: [<0.1291.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 990
neighbours:
12:05:38.877 [error] GenServer #PID<0.1321.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
** (EXIT) normal
(stdlib) proc_lib.erl:796: :proc_lib.stop/3
(ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
(ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
(stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:681: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:38 ===
crasher:
initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
pid: <0.1321.0>
registered_name: []
exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
messages: []
links: [<0.1291.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 27
reductions: 954
neighbours:
12:05:39.301 [error] GenServer #PID<0.1322.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
** (EXIT) normal
(stdlib) proc_lib.erl:796: :proc_lib.stop/3
(ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
(ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
(stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:681: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:39 ===
crasher:
initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
pid: <0.1322.0>
registered_name: []
exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
messages: []
links: [<0.1291.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 27
reductions: 962
neighbours:
12:05:39.303 [error] GenServer #PID<0.1323.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
** (EXIT) normal
(stdlib) proc_lib.erl:796: :proc_lib.stop/3
(ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
(ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
(stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:681: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:39 ===
crasher:
initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
pid: <0.1323.0>
registered_name: []
exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
messages: []
links: [<0.1291.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 27
reductions: 970
neighbours:
12:05:39.304 [error] GenServer #PID<0.1324.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
** (EXIT) normal
(stdlib) proc_lib.erl:796: :proc_lib.stop/3
(ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
(ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
(stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:681: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:39 ===
crasher:
initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
pid: <0.1324.0>
registered_name: []
exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
in function gen_server:terminate/7 (gen_server.erl, line 826)
ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
messages: []
links: [<0.1291.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 27
reductions: 978
neighbours:
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
Context: child_terminated
Reason: noproc
Offender: [{pid,<0.1325.0>},
{id,'Elixir.AssetProcessor.Workers.MasterProcess'},
{mfargs,
{'Elixir.AssetProcessor.Workers.MasterProcess',
start_link,
[#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
<<"id">> => 7},
7]}},
{restart_type,transient},
{shutdown,infinity},
{child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
Context: child_terminated
Reason: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
Offender: [{pid,<0.1321.0>},
{id,'Elixir.AssetProcessor.Workers.MasterProcess'},
{mfargs,
{'Elixir.AssetProcessor.Workers.MasterProcess',
start_link,
[#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
<<"id">> => 3},
3]}},
{restart_type,transient},
{shutdown,infinity},
{child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
Context: child_terminated
Reason: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
Offender: [{pid,<0.1322.0>},
{id,'Elixir.AssetProcessor.Workers.MasterProcess'},
{mfargs,
{'Elixir.AssetProcessor.Workers.MasterProcess',
start_link,
[#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
<<"id">> => 4},
4]}},
{restart_type,transient},
{shutdown,infinity},
{child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
Context: child_terminated
Reason: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
Offender: [{pid,<0.1323.0>},
{id,'Elixir.AssetProcessor.Workers.MasterProcess'},
{mfargs,
{'Elixir.AssetProcessor.Workers.MasterProcess',
start_link,
[#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
<<"id">> => 5},
5]}},
{restart_type,transient},
{shutdown,infinity},
{child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.1323.0>},
{id,'Elixir.AssetProcessor.Workers.MasterProcess'},
{mfargs,
{'Elixir.AssetProcessor.Workers.MasterProcess',
start_link,
[#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
<<"id">> => 5},
5]}},
{restart_type,transient},
{shutdown,infinity},
{child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
Supervisor: {local,'Elixir.AssetProcessor.MasterProcess.Supervisor'}
Context: child_terminated
Reason: shutdown
Offender: [{pid,<0.1291.0>},
{id,'Elixir.AssetProcessor.Workers.Supervisor'},
{mfargs,
{'Elixir.AssetProcessor.Workers.Supervisor',start_link,
[]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]
mix.exs:
defmodule AssetProcessor.MasterProcess.Mixfile do
use Mix.Project
@version "0.1.9"
def project do
[app: :ap_master,
description: "OrangeSky Asset Processor - Master Process Worker.",
package: package,
version: @version,
elixir: "~> 1.3",
elixirc_paths: elixirc_paths(Mix.env),
compilers: [] ++ Mix.compilers,
build_embedded: Mix.env == :prod,
start_permanent: Mix.env == :prod,
aliases: aliases(),
deps: deps(),
test_coverage: [tool: ExCoveralls],
preferred_cli_env: [coveralls: :test],
docs: [source_ref: "v#\{@version\}", main: "readme", extras: ["README.md"]]]
end
...
rel/config.ex:
use Mix.Releases.Config,
default_release: :default,
default_environment: :default
environment :default do
set pre_start_hook: "bin/hooks/pre-start.sh"
set dev_mode: false
set include_erts: false
set include_src: false
end
release :ap_master do
set version: current_version(:ap_master)
set applications: [
ap_master: :permanent
]
end
Container is built with MIX_ENV=prod env.
At this moment its a Kubernetes Docker container with stopped application supervisor but Erlang VM is alive. So it's neither working, nor restarting.
Another one:
=SUPERVISOR REPORT==== 7-Dec-2016::15:42:24 ===
Supervisor: {local,'Elixir.Trader.Workers.Supervisor'}
Context: child_terminated
Reason: {#{'__exception__' => true,
'__struct__' => 'Elixir.Postgrex.Error',
connection_id => 16553,
message => nil,
postgres => #{code => undefined_column,
file => <<"parse_relation.c">>,
line => <<"3090">>,
message => <<"column b1.loans_invest_whole does not exist">>,
pg_code => <<"42703">>,
position => <<"350">>,
routine => <<"errorMissingColumn">>,
severity => <<"ERROR">>,
unknown => <<"ERROR">>}},
[{'Elixir.Ecto.Adapters.SQL',execute_and_cache,7,
[{file,"lib/ecto/adapters/sql.ex"},{line,415}]},
{'Elixir.Ecto.Repo.Queryable',execute,5,
[{file,"lib/ecto/repo/queryable.ex"},{line,121}]},
{'Elixir.Ecto.Repo.Queryable',all,4,
[{file,"lib/ecto/repo/queryable.ex"},{line,35}]},
{'Elixir.Ecto.Repo.Queryable',one,4,
[{file,"lib/ecto/repo/queryable.ex"},{line,59}]},
{'Elixir.Trader.Workers.Analyzer',do_task,2,
[{file,"lib/workers/analyzer.ex"},{line,95}]},
{'Elixir.Trader.Workers.Analyzer',handle_info,2,
[{file,"lib/workers/analyzer.ex"},{line,50}]},
{gen_server,try_dispatch,4,
[{file,"gen_server.erl"},{line,615}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,681}]}]}
Offender: [{pid,<0.1425.0>},
{id,'Elixir.Trader.Workers.Analyzer'},
{mfargs,
{'Elixir.Trader.Workers.Analyzer',start_link,
[#{<<"buckets">> => [#{<<"actual_volume">> => 0,<<"bucket_id">> => 1}],
<<"portfolio_subscription_id">> => 1},
1]}},
{restart_type,transient},
{shutdown,5000},
{child_type,worker}]
=SUPERVISOR REPORT==== 7-Dec-2016::15:42:24 ===
Supervisor: {local,'Elixir.Trader.Workers.Supervisor'}
Context: shutdown
Reason: reached_max_restart_intensity
Offender: [{pid,<0.1425.0>},
{id,'Elixir.Trader.Workers.Analyzer'},
{mfargs,
{'Elixir.Trader.Workers.Analyzer',start_link,
[#{<<"buckets">> => [#{<<"actual_volume">> => 0,<<"bucket_id">> => 1}],
<<"portfolio_subscription_id">> => 1},
1]}},
{restart_type,transient},
{shutdown,5000},
{child_type,worker}]
=SUPERVISOR REPORT==== 7-Dec-2016::15:42:24 ===
Supervisor: {local,'Elixir.Trader.GapAnalyzer.Supervisor'}
Context: child_terminated
Reason: shutdown
Offender: [{pid,<0.1245.0>},
{id,'Elixir.Trader.Workers.Supervisor'},
{mfargs,{'Elixir.Trader.Workers.Supervisor',start_link,[]}},
{restart_type,permanent},
{shutdown,infinity},
{child_type,supervisor}]
It seems that this is not Distillery's fault, rather unexpected (for me) behaviour of transient supervisor.
Right now Distillery (and exrm) ignores
start_permanent
option in mix.exs and this behavior is ambiguous for most of developers that I know (few project in production have issues because of it).Of course, we can add bold text that "you should set application type in
rel/config.ex
", but other option is more developer-friendly - to read application start type from Mix.exs and use it as default value.