derekkraan / horde

Horde is a distributed Supervisor and Registry backed by DeltaCrdt
MIT License
1.32k stars 106 forks source link

Horde DynamicSupervisor freezed for multiple seconds #220

Closed philipgiuliani closed 4 years ago

philipgiuliani commented 4 years ago

Looking through our logs I could find a lot of these exceptions. They don't seem to have any negative effect on the environment though.

07:36:57.588 [warn] Exit while fetching metrics from Example.Scheduler.Supervisor.
Skip poll action. Reason: {:timeout, {GenServer, :call, [Example.Scheduler.Supervisor, :get_telemetry, 5000]}}.

07:38:23.028 [error] GenServer Example.Scheduler.Supervisor.NodeListener terminating
** (stop) exited in: GenServer.call(Example.Scheduler.Supervisor, {:set_members, [{Example.Scheduler.Supervisor, :"hostname@172.31.6.70"}, {Example.Scheduler.Supervisor, :"hostname@172.31.28.46"}, {Example.Scheduler.Supervisor, :"hostname@172.31.44.66"}, {Example.Scheduler.Supervisor, :"hostname@172.31.41.100"}]}, 5000)
    ** (EXIT) time out
    (elixir 1.10.4) lib/gen_server.ex:1023: GenServer.call/3
    (horde 0.8.2) lib/horde/node_listener.ex:50: Horde.NodeListener.set_members/1
    (horde 0.8.2) lib/horde/node_listener.ex:34: Horde.NodeListener.handle_info/2
    (stdlib 3.12) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib 3.12) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib 3.12) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:nodeup, :"hostname@172.31.41.100", [node_type: :visible]}

Horde Version: 0.8.2

philipgiuliani commented 4 years ago

This error was maybe related to the Supervisor Deadlock of Horde 0.8.2. Until now it did not happen again with Horde 0.8.3

derekkraan commented 4 years ago

Thanks for checking back in and please reopen if you see this happen again with >0.8.3!