derekkraan / horde

Horde is a distributed Supervisor and Registry backed by DeltaCrdt
MIT License
1.32k stars 106 forks source link

Strange behaviour when using Supervisor as a Horde.DynamicSupervisor's child #221

Closed krasenyp closed 4 years ago

krasenyp commented 4 years ago

Hi there. I'm trying to start a Supervisor process as a child of Horde.DynamicSupervisor in a multi-node setup but I'm facing a strange issue. I use two nodes for testing purposes and there's 2-5 seconds between the initialization of the first node and the start of the second. Every time both Supervisors are started when only one of them should be started. Here's a MWE:

application.ex

defmodule Test.Application do
  use Application

  def start(_type, _args) do
    topologies = [
      example: [
        strategy: Cluster.Strategy.Epmd,
        config: [hosts: [:"a@127.0.0.1", :"b@127.0.0.1"]]
      ]
    ]

    children = [
      {Cluster.Supervisor, [topologies, [name: Test.ClusterSupervisor]]},
      {Horde.Registry, name: Test.HordeRegistry, keys: :unique, members: :auto},
      {Horde.DynamicSupervisor,
       name: Test.HordeSupervisor, strategy: :one_for_one, members: :auto},
      %{
        id: Test.ClusterConnector,
        restart: :transient,
        start:
          {Task, :start_link,
           [
             fn ->
               Horde.DynamicSupervisor.start_child(
                 Test.HordeSupervisor,
                 Test.Supervisor
               )
             end
           ]}
      }
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

supervisor.ex

defmodule Test.Supervisor do
  use Supervisor

  alias Test.{AMQP, Events, HordeRegistry}

  def start_link(args) do
    case Supervisor.start_link(__MODULE__, args, name: via_tuple()) do
      {:ok, pid} ->
        {:ok, pid}

      {:error, {:already_started, _pid}} ->
        :ignore
    end
  end

  def init(_args) do
    children = [
      Events.Supervisor,
      AMQP.Supervisor
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  defp via_tuple() do
    {:via, Horde.Registry, {HordeRegistry, __MODULE__}}
  end
end

I feel I'm missing something but I follow the documentation and the available examples. However, if I have the following application.ex there are no issues and everything work as expected:

defmodule Test.Application do
  use Application

  def start(_type, _args) do
    topologies = [
      example: [
        strategy: Cluster.Strategy.Epmd,
        config: [hosts: [:"a@127.0.0.1", :"b@127.0.0.1"]]
      ]
    ]

    children = [
      {Cluster.Supervisor, [topologies, [name: Test.ClusterSupervisor]]},
      {Horde.Registry, name: Test.HordeRegistry, keys: :unique, members: :auto},
      {Horde.DynamicSupervisor,
       name: Test.HordeSupervisor, strategy: :one_for_one, members: :auto},
      %{
        id: Test.ClusterConnector,
        restart: :transient,
        start:
          {Task, :start_link,
           [
             fn ->
               Horde.DynamicSupervisor.start_child(
                 Test.HordeSupervisor,
                 Test.AMQP.Consumer
               )
             end
           ]}
      }
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
  end
end

Test.AMQP.Consumer is a GenServer. Can only GenServers be added to a Horde.DynamicSupervisor? I looked at the source but couldn't find anything to prove only GenServers can be distributed with horde.

derekkraan commented 4 years ago

Hi @krasenyp,

The main issue here is that Horde.Registry sends an exit signal to whichever process it has decided is unneeded, when it detects a duplicate. However, Supervisor only listens for exit signals from its parent supervisor (in this case a Horde.DynamicSupervisor process). So Supervisor is ignoring the exit signals it's getting from Horde.Registry, and you're left with two supervisors running, not what you want.

You'll likely have to figure out some other way to make this work for you. One possible solution would be to start a child of your Supervisor that owns the via tuple, and also directs the Supervisor to shut down when it receives an exit signal.

Let me know what you come up with. This is perhaps something that should be covered more explicitly in the docs.

Cheers Derek