bitwalker / distillery

Simplify deployments in Elixir with OTP releases!
MIT License
2.97k stars 398 forks source link

Read `start_permanent` from mix.exs #118

Closed AndrewDryga closed 7 years ago

AndrewDryga commented 7 years ago

Right now Distillery (and exrm) ignores start_permanent option in mix.exs and this behavior is ambiguous for most of developers that I know (few project in production have issues because of it).

Of course, we can add bold text that "you should set application type in rel/config.ex", but other option is more developer-friendly - to read application start type from Mix.exs and use it as default value.

bitwalker commented 7 years ago

The top-level application in a release will always be started as permanent, so this setting doesn't actually apply to releases. Was there a specific issue you encountered?

AndrewDryga commented 7 years ago

In some of our workers when connection pool is overloaded, GenServer (it is added to a application supervisor as worker) will fail due to pool connection timeout. This failure results in reached max restart intensity error, then application supervisor should fail and kill erlang vm, but it lives with dead GenServer and application is not working.

bitwalker commented 7 years ago

@AndrewDryga Do you have an example app I can use to reproduce?

AndrewDryga commented 7 years ago

Unfortunately I don't have ways to reproduce it, but I have logs from production:

12:05:58.289 [info]  Raise processing flag: 5. Batch: "74ff28ea9920dc6ddccbbb25ed823b69d99e8bca".
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1397.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1398.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1399.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.672 [error] Postgrex.Protocol (#PID<0.1400.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.673 [error] Postgrex.Protocol (#PID<0.1401.0>) failed to connect: ** (Postgrex.Error) tcp connect: non-existing domain - :nxdomain
12:06:05.673 [warn]  Loan from batch 09c0d1688a1ed0bd9b46831ec210da4024190ab7 not found. Removing task.
12:05:38.874 [error] GenServer #PID<0.1325.0> terminating
** (stop) no process
    (stdlib) proc_lib.erl:794: :proc_lib.stop/3
    (ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
    (ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
    (stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:681: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:38 ===
  crasher:
    initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
    pid: <0.1325.0>
    registered_name: []
    exception exit: noproc
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
                  'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
    messages: []
    links: [<0.1291.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 990
  neighbours:
12:05:38.877 [error] GenServer #PID<0.1321.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
    ** (EXIT) normal
    (stdlib) proc_lib.erl:796: :proc_lib.stop/3
    (ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
    (ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
    (stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:681: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:38 ===
  crasher:
    initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
    pid: <0.1321.0>
    registered_name: []
    exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
                  'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
    messages: []
    links: [<0.1291.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 954
  neighbours:
12:05:39.301 [error] GenServer #PID<0.1322.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
    ** (EXIT) normal
    (stdlib) proc_lib.erl:796: :proc_lib.stop/3
    (ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
    (ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
    (stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:681: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:39 ===
  crasher:
    initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
    pid: <0.1322.0>
    registered_name: []
    exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
                  'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
    messages: []
    links: [<0.1291.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 962
  neighbours:
12:05:39.303 [error] GenServer #PID<0.1323.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
    ** (EXIT) normal
    (stdlib) proc_lib.erl:796: :proc_lib.stop/3
    (ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
    (ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
    (stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:681: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:39 ===
  crasher:
    initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
    pid: <0.1323.0>
    registered_name: []
    exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
                  'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
    messages: []
    links: [<0.1291.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 970
  neighbours:
12:05:39.304 [error] GenServer #PID<0.1324.0> terminating
** (stop) exited in: :sys.terminate(#PID<0.1311.0>, :normal, :infinity)
    ** (EXIT) normal
    (stdlib) proc_lib.erl:796: :proc_lib.stop/3
    (ap_master) lib/workers/master.ex:222: AssetProcessor.Workers.MasterProcess.get_staging_loan/2
    (ap_master) lib/workers/master.ex:32: AssetProcessor.Workers.MasterProcess.handle_info/2
    (stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:681: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
=CRASH REPORT==== 2-Nov-2016::12:05:39 ===
  crasher:
    initial call: Elixir.AssetProcessor.Workers.MasterProcess:init/1
    pid: <0.1324.0>
    registered_name: []
    exception exit: {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
      in function  gen_server:terminate/7 (gen_server.erl, line 826)
    ancestors: ['Elixir.AssetProcessor.Workers.Supervisor',
                  'Elixir.AssetProcessor.MasterProcess.Supervisor',<0.1231.0>]
    messages: []
    links: [<0.1291.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 978
  neighbours:
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
     Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
     Context:    child_terminated
     Reason:     noproc
     Offender:   [{pid,<0.1325.0>},
                  {id,'Elixir.AssetProcessor.Workers.MasterProcess'},
                  {mfargs,
                      {'Elixir.AssetProcessor.Workers.MasterProcess',
                          start_link,
                          [#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
                             <<"id">> => 7},
                           7]}},
                  {restart_type,transient},
                  {shutdown,infinity},
                  {child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
     Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
     Context:    child_terminated
     Reason:     {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
     Offender:   [{pid,<0.1321.0>},
                  {id,'Elixir.AssetProcessor.Workers.MasterProcess'},
                  {mfargs,
                      {'Elixir.AssetProcessor.Workers.MasterProcess',
                          start_link,
                          [#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
                             <<"id">> => 3},
                           3]}},
                  {restart_type,transient},
                  {shutdown,infinity},
                  {child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
     Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
     Context:    child_terminated
     Reason:     {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
     Offender:   [{pid,<0.1322.0>},
                  {id,'Elixir.AssetProcessor.Workers.MasterProcess'},
                  {mfargs,
                      {'Elixir.AssetProcessor.Workers.MasterProcess',
                          start_link,
                          [#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
                             <<"id">> => 4},
                           4]}},
                  {restart_type,transient},
                  {shutdown,infinity},
                  {child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
     Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
     Context:    child_terminated
     Reason:     {normal,{sys,terminate,[<0.1311.0>,normal,infinity]}}
     Offender:   [{pid,<0.1323.0>},
                  {id,'Elixir.AssetProcessor.Workers.MasterProcess'},
                  {mfargs,
                      {'Elixir.AssetProcessor.Workers.MasterProcess',
                          start_link,
                          [#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
                             <<"id">> => 5},
                           5]}},
                  {restart_type,transient},
                  {shutdown,infinity},
                  {child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
     Supervisor: {local,'Elixir.AssetProcessor.Workers.Supervisor'}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.1323.0>},
                  {id,'Elixir.AssetProcessor.Workers.MasterProcess'},
                  {mfargs,
                      {'Elixir.AssetProcessor.Workers.MasterProcess',
                          start_link,
                          [#{<<"batch_id">> => <<"8a08296000703cd36aabd3dd6a5ed48daa0aa289">>,
                             <<"id">> => 5},
                           5]}},
                  {restart_type,transient},
                  {shutdown,infinity},
                  {child_type,supervisor}]
=SUPERVISOR REPORT==== 2-Nov-2016::12:05:39 ===
     Supervisor: {local,'Elixir.AssetProcessor.MasterProcess.Supervisor'}
     Context:    child_terminated
     Reason:     shutdown
     Offender:   [{pid,<0.1291.0>},
                  {id,'Elixir.AssetProcessor.Workers.Supervisor'},
                  {mfargs,
                      {'Elixir.AssetProcessor.Workers.Supervisor',start_link,
                          []}},
                  {restart_type,permanent},
                  {shutdown,infinity},
                  {child_type,supervisor}]

mix.exs:

defmodule AssetProcessor.MasterProcess.Mixfile do
  use Mix.Project

  @version "0.1.9"

  def project do
    [app: :ap_master,
     description: "OrangeSky Asset Processor - Master Process Worker.",
     package: package,
     version: @version,
     elixir: "~> 1.3",
     elixirc_paths: elixirc_paths(Mix.env),
     compilers: [] ++ Mix.compilers,
     build_embedded: Mix.env == :prod,
     start_permanent: Mix.env == :prod,
     aliases: aliases(),
     deps: deps(),
     test_coverage: [tool: ExCoveralls],
     preferred_cli_env: [coveralls: :test],
     docs: [source_ref: "v#\{@version\}", main: "readme", extras: ["README.md"]]]
  end
...

rel/config.ex:

use Mix.Releases.Config,
  default_release: :default,
  default_environment: :default

environment :default do
  set pre_start_hook: "bin/hooks/pre-start.sh"
  set dev_mode: false
  set include_erts: false
  set include_src: false
end

release :ap_master do
  set version: current_version(:ap_master)
  set applications: [
    ap_master: :permanent
  ]
end

Container is built with MIX_ENV=prod env.

AndrewDryga commented 7 years ago

At this moment its a Kubernetes Docker container with stopped application supervisor but Erlang VM is alive. So it's neither working, nor restarting.

AndrewDryga commented 7 years ago

Another one:

=SUPERVISOR REPORT==== 7-Dec-2016::15:42:24 ===
     Supervisor: {local,'Elixir.Trader.Workers.Supervisor'}
     Context:    child_terminated
     Reason:     {#{'__exception__' => true,
                    '__struct__' => 'Elixir.Postgrex.Error',
                    connection_id => 16553,
                    message => nil,
                    postgres => #{code => undefined_column,
                      file => <<"parse_relation.c">>,
                      line => <<"3090">>,
                      message => <<"column b1.loans_invest_whole does not exist">>,
                      pg_code => <<"42703">>,
                      position => <<"350">>,
                      routine => <<"errorMissingColumn">>,
                      severity => <<"ERROR">>,
                      unknown => <<"ERROR">>}},
                  [{'Elixir.Ecto.Adapters.SQL',execute_and_cache,7,
                       [{file,"lib/ecto/adapters/sql.ex"},{line,415}]},
                   {'Elixir.Ecto.Repo.Queryable',execute,5,
                       [{file,"lib/ecto/repo/queryable.ex"},{line,121}]},
                   {'Elixir.Ecto.Repo.Queryable',all,4,
                       [{file,"lib/ecto/repo/queryable.ex"},{line,35}]},
                   {'Elixir.Ecto.Repo.Queryable',one,4,
                       [{file,"lib/ecto/repo/queryable.ex"},{line,59}]},
                   {'Elixir.Trader.Workers.Analyzer',do_task,2,
                       [{file,"lib/workers/analyzer.ex"},{line,95}]},
                   {'Elixir.Trader.Workers.Analyzer',handle_info,2,
                       [{file,"lib/workers/analyzer.ex"},{line,50}]},
                   {gen_server,try_dispatch,4,
                       [{file,"gen_server.erl"},{line,615}]},
                   {gen_server,handle_msg,5,
                       [{file,"gen_server.erl"},{line,681}]}]}
     Offender:   [{pid,<0.1425.0>},
                  {id,'Elixir.Trader.Workers.Analyzer'},
                  {mfargs,
                      {'Elixir.Trader.Workers.Analyzer',start_link,
                          [#{<<"buckets">> => [#{<<"actual_volume">> => 0,<<"bucket_id">> => 1}],
                             <<"portfolio_subscription_id">> => 1},
                           1]}},
                  {restart_type,transient},
                  {shutdown,5000},
                  {child_type,worker}]
=SUPERVISOR REPORT==== 7-Dec-2016::15:42:24 ===
     Supervisor: {local,'Elixir.Trader.Workers.Supervisor'}
     Context:    shutdown
     Reason:     reached_max_restart_intensity
     Offender:   [{pid,<0.1425.0>},
                  {id,'Elixir.Trader.Workers.Analyzer'},
                  {mfargs,
                      {'Elixir.Trader.Workers.Analyzer',start_link,
                          [#{<<"buckets">> => [#{<<"actual_volume">> => 0,<<"bucket_id">> => 1}],
                             <<"portfolio_subscription_id">> => 1},
                           1]}},
                  {restart_type,transient},
                  {shutdown,5000},
                  {child_type,worker}]
=SUPERVISOR REPORT==== 7-Dec-2016::15:42:24 ===
     Supervisor: {local,'Elixir.Trader.GapAnalyzer.Supervisor'}
     Context:    child_terminated
     Reason:     shutdown
     Offender:   [{pid,<0.1245.0>},
                  {id,'Elixir.Trader.Workers.Supervisor'},
                  {mfargs,{'Elixir.Trader.Workers.Supervisor',start_link,[]}},
                  {restart_type,permanent},
                  {shutdown,infinity},
                  {child_type,supervisor}]
AndrewDryga commented 7 years ago

It seems that this is not Distillery's fault, rather unexpected (for me) behaviour of transient supervisor.