derekkraan / horde

Horde is a distributed Supervisor and Registry backed by DeltaCrdt
MIT License
1.28k stars 101 forks source link

Registry is not up to date when starting two nodes #89

Closed jfrolich closed 5 years ago

jfrolich commented 5 years ago

I think this is a bug.

TLDR: The Distributed registry doesn't retrieve state when joining a cluster after running set_members.

This is in my application.ex

[
  ...,
  {Cluster.Supervisor, [Application.get_env(:libcluster, :topologies)]},
  {Horde.Registry, name: FamilyFive.HordeRegistry, keys: :unique},
  {Horde.Supervisor,
         name: FamilyFive.HordeSupervisor, strategy: :one_for_one, children: []},
  ...
]

And this is a Tracker module that runs after the participating nodes are retrieved or changed:

Horde.Cluster.set_members(
  FamilyFive.HordeRegistry,
  Enum.map(nodes, fn n -> {FamilyFive.HordeRegistry, n} end
)

Horde.Cluster.set_members(
  FamilyFive.HordeSupervisor,
  Enum.map(nodes, fn n -> {FamilyFive.HordeSupervisor, n} end)
)

When I start a new process:

Horde.Supervisor.start_child(
  FamilyFive.HordeSupervisor,
  FamilyFive.PushNotifications.PushNotificationsScheduler
)

The FamilyFive.PushNotifications.PushNotificationsScheduler includes the following start_link to be registered:

GenServer.start_link(__MODULE__, [],
  name: {:via, Horde.Registry, {FamilyFive.HordeRegistry, __MODULE__}}
)

Then when I start up a new node and run this on the new node:

Horde.Registry.lookup(
  FamilyFive.HordeRegistry,
  FamilyFive.PushNotifications.PushNotificationsScheduler
)

It returns :undefined even though it returns the pid correctly on the other node. This inconsistency is not resolved automatically.

If I start the two nodes first and then run start the process on one of the nodes the lookups works. But if I first start one node run the process, and only then spin up the new node and then run the lookup it doesn't work.

I think it might have to do that I don't specify the members on startup, but I don't know the members at startup (they are only known when libcluster queries the DNS).

derekkraan commented 5 years ago

I have a strong suspicion that this is fixed in #92 (See #90 for details). Could you please test that branch and let me know what happens?

jfrolich commented 5 years ago

It does seem to resolve it!

jfrolich commented 5 years ago

However I get the following error if I do an :init.stop():

[error] GenServer FamilyFive.HordeSupervisor.Crdt terminating
** (ArgumentError) argument error
    :erlang.send(FamilyFive.HordeSupervisor, {:crdt_update, [{:add, {:process, FamilyFive.PushNotifications.PushNotificationsScheduler}, {nil, %{id: FamilyFive.PushNotifications.PushNotificationsScheduler, modules: [FamilyFive.PushNotifications.PushNotificationsScheduler], restart: :permanent, shutdown: 5000, start: {FamilyFive.PushNotifications.PushNotificationsScheduler, :start_link, []}, type: :worker}}}]})
    (horde) lib/horde/supervisor_supervisor.ex:12: anonymous fn/2 in Horde.SupervisorSupervisor.init/1
    (delta_crdt) lib/delta_crdt/causal_crdt.ex:201: DeltaCrdt.CausalCrdt.update_state_with_delta/3
    (delta_crdt) lib/delta_crdt/causal_crdt.ex:166: DeltaCrdt.CausalCrdt.handle_cast/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:operation, {:add, [process: FamilyFive.PushNotifications.PushNotificationsScheduler, nil: %{id: FamilyFive.PushNotifications.PushNotificationsScheduler, modules: [FamilyFive.PushNotifications.PushNotificationsScheduler], restart: :permanent, shutdown: 5000, start: {FamilyFive.PushNotifications.PushNotificationsScheduler, :start_link, []}, type: :worker}]}}}
State: %DeltaCrdt.CausalCrdt{crdt_module: DeltaCrdt.AWLWWMap, crdt_state: %DeltaCrdt.AWLWWMap{dots: %{434327776 => 3, 704243284 => 3, 883112317 => 8}, value: %{{:member, {FamilyFive.HordeSupervisor, :"a@127.0.0.1"}} => %{{1, 1555133645008603000} => #MapSet<[{883112317, 7}]>}, {:member, {FamilyFive.HordeSupervisor, :"c@127.0.0.1"}} => %{{1, 1555133616375684000} => #MapSet<[{883112317, 3}]>}, {:member_node_info, {FamilyFive.HordeSupervisor, :"a@127.0.0.1"}} => %{{%Horde.Supervisor.Member{name: {FamilyFive.HordeSupervisor, :"a@127.0.0.1"}, status: :alive}, 1555133645053704000} => #MapSet<[{704243284, 1}]>}, {:member_node_info, {FamilyFive.HordeSupervisor, :"c@127.0.0.1"}} => %{{%Horde.Supervisor.Member{name: {FamilyFive.HordeSupervisor, :"c@127.0.0.1"}, status: :shutting_down}, 1555133652587732000} => #MapSet<[{883112317, 8}]>}, {:process, FamilyFive.PushNotifications.PushNotificationsScheduler} => %{{{{FamilyFive.HordeSupervisor, :"c@127.0.0.1"}, %{id: FamilyFive.PushNotifications.PushNotificationsScheduler, restart: :permanent, start: {FamilyFive.PushNotifications.PushNotificationsScheduler, :start_link, []}}}, 1555133621006383000} => #MapSet<[{883112317, 4}]>}}}, merkle_tree: %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: %{{:member, {FamilyFive.HordeSupervisor, :"c@127.0.0.1"}} => <<58, 230>>}, hash: "bq"}}, hash: <<133, 107>>}}, hash: "Ҝ"}}, hash: <<61, 214>>}, nil}, hash: <<113, 155>>}}, hash: <<123, 28>>}}, hash: <<155, 93>>}}, hash: <<185, 109>>}}, hash: "xP"}}, hash: <<128, 89>>}, nil}, hash: <<30, 123>>}}, hash: <<236, 29>>}, nil}, hash: <<47, 165>>}, nil}, hash: <<19, 32>>}, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: %{{:process, FamilyFive.PushNotifications.PushNotificationsScheduler} => <<103, 206>>}, hash: <<243, 50>>}, nil}, hash: <<22, 60>>}}, hash: <<28, 43>>}}, hash: <<198, 240>>}, nil}, hash: <<238, 154>>}}, hash: <<192, 201>>}}, hash: <<42, 203>>}, nil}, hash: <<26, 208>>}}, hash: "Ż"}}, hash: <<46, 131>>}, nil}, hash: <<29, 99>>}}, hash: <<169, 248>>}}, hash: <<34, 181>>}}, hash: <<173, 218>>}}, hash: <<254, 33>>}, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: %{{:member, {FamilyFive.HordeSupervisor, :"a@127.0.0.1"}} => <<58, 230>>}, hash: <<247, 98>>}, nil}, hash: <<14, 194>>}, nil}, hash: "Fe"}, nil}, hash: <<9, 233>>}, nil}, hash: <<174, 79>>}, nil}, hash: "mF"}, nil}, hash: <<150, 233>>}, nil}, hash: <<199, 102>>}}, hash: <<108, 147>>}, nil}, hash: <<186, 84>>}, nil}, hash: <<57, 171>>}}, hash: <<240, 164>>}, nil}, hash: "{w"}, nil}, hash: "6m"}, nil}, hash: <<193, 79>>}}, hash: <<199, 211>>}, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: %{{:member_node_info, {FamilyFive.HordeSupervisor, :"c@127.0.0.1"}} => <<227, 223>>}, hash: <<171, 117>>}, nil}, hash: <<159, 14>>}, nil}, hash: <<73, 252>>}}, hash: <<24, 213>>}, nil}, hash: "#K"}}, hash: <<202, 243>>}, nil}, hash: <<136, 44>>}, nil}, hash: <<45, 21>>}}, hash: <<28, 9>>}, nil}, hash: <<206, 111>>}, nil}, hash: "4!"}, nil}, hash: <<224, 43>>}, nil}, hash: <<223, 59>>}}, hash: <<227, 8>>}}, hash: <<244, 112>>}, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {nil, %MerkleTree{children: {%MerkleTree{children: {%MerkleTree{children: %{{:member_node_info, {FamilyFive.HordeSupervisor, :"a@127.0.0.1"}} => <<252, 229>>}, hash: <<6, 197>>}, nil}, hash: "*?"}, nil}, hash: <<193, 91>>}}, hash: <<153, 32>>}, nil}, hash: <<251, 11>>}}, hash: "hb"}}, hash: <<197, 44>>}}, hash: <<118, 222>>}}, hash: <<242, 202>>}}, hash: <<78, 192>>}, nil}, hash: <<12, 139>>}, nil}, hash: <<143, 159>>}, nil}, hash: <<138, 226>>}}, hash: <<139, 112>>}, nil}, hash: "q\v"}}, hash: <<160, 148>>}}, hash: "}V"}, name: FamilyFive.HordeSupervisor.Crdt, neighbours: #MapSet<[{FamilyFive.HordeSupervisor.Crdt, :"a@127.0.0.1"}]>, node_id: 883112317, on_diffs: #Function<0.101987752/1 in Horde.SupervisorSupervisor.init/1>, sequence_number: 0, storage_module: nil, sync_interval: 100}
derekkraan commented 5 years ago

@jfrolich, thanks for your feedback, I will consider this issue resolved.

The error you are seeing is tricky (although not indicative of incorrect operation), but it would be nice to have it fixed. Could you open a separate issue to track this one?