While doing a deployment on kubernetes, I noticed this error occasionally:
00:12:09.100 [error] GenServer Service.Scheduler.HordeSupervisor terminating
** (MatchError) no match of right hand side value: nil
(horde 0.7.1) lib/horde/dynamic_supervisor_impl.ex:252: Horde.DynamicSupervisorImpl.handle_cast/2
(stdlib 3.11.2) gen_server.erl:637: :gen_server.try_dispatch/4
(stdlib 3.11.2) gen_server.erl:711: :gen_server.handle_msg/6
(stdlib 3.11.2) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:relinquish_child_process, 120031833532712903379486492195407090876}}
00:12:09.102 [error] GenServer #PID<0.4364.0> terminating
** (stop) exited in: GenServer.call(Service.Scheduler.HordeSupervisor, :horde_shutting_down, 5000)
** (EXIT) an exception was raised:
** (MatchError) no match of right hand side value: nil
(horde 0.7.1) lib/horde/dynamic_supervisor_impl.ex:252: Horde.DynamicSupervisorImpl.handle_cast/2
(stdlib 3.11.2) gen_server.erl:637: :gen_server.try_dispatch/4
(stdlib 3.11.2) gen_server.erl:711: :gen_server.handle_msg/6
(stdlib 3.11.2) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
(elixir 1.10.1) lib/gen_server.ex:1023: GenServer.call/3
(horde 0.7.1) lib/horde/signal_shutdown.ex:21: anonymous fn/1 in Horde.SignalShutdown.terminate/2
(elixir 1.10.1) lib/enum.ex:783: Enum."-each/2-lists^foreach/1-0-"/2
(elixir 1.10.1) lib/enum.ex:783: Enum.each/2
(stdlib 3.11.2) gen_server.erl:673: :gen_server.try_terminate/3
(stdlib 3.11.2) gen_server.erl:858: :gen_server.terminate/10
(stdlib 3.11.2) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.4360.0>, :shutdown}
It looks like the root cause is related to the child PID in question not being present in the node's state. Could just be a matter of the CRDT not being fully synced.
Anyway, I think it would be sensible to handle this gracefully (and not crash the process). Something like this should work:
def handle_cast({:relinquish_child_process, child_id}, state) do
# signal to the rest of the nodes that this process has been relinquished
# (to the Horde!) by its parent
case Map.get(state.processes_by_id, child_id) do
{_, child, _} ->
:ok =
DeltaCrdt.mutate(
crdt_name(state.name),
:add,
[{:process, child.id}, {nil, child}]
)
nil ->
# the process doesn't exist in the local state. state not in sync?
nil
end
{:noreply, state}
end
While doing a deployment on kubernetes, I noticed this error occasionally:
It looks like the root cause is related to the child PID in question not being present in the node's state. Could just be a matter of the CRDT not being fully synced.
Anyway, I think it would be sensible to handle this gracefully (and not crash the process). Something like this should work:
I'll just add it to my existing PR (#194).