alfetahe / process-hub

Distributed processes manager and global process registry
GNU General Public License v3.0
62 stars 2 forks source link

Questions about interacting with active processes and cluster-global processes #2

Closed bjuretic closed 2 months ago

bjuretic commented 2 months ago

Hello,

nice library! I am trying to experiment with some things, but the lack of examples makes it slow, so let me ask you here.

  1. If I want to communicate with a process on the cluster, I need to know if it is "active" or "passive" in your terminology, but is there any way to find that out, so that I can send the message to the currently active process? So let's imagine that one process is a "service" and the other nodes might want it to do something - I need to know which of the processes returned by ProcessHub.process_list to call.
  2. I am trying to make some processes work as "cluster-globals", i.e. just one process (active) on the cluster. Let's imagine user sessions as an example, which is holding state for all the logged in users for all nodes. How would you go about creating/configuring this inside of your paradigm? Using ProcessHub.Strategy.Redundancy.Replication with replication_factor set to cluster_size the process is started on all nodes, but without knowing which process is "active" I don't see how I can implement calling this one process all the time.
  3. In case a node joins/leaves the cluster it is migrated and restarted (if needed) on another node, so this works fine. But in case gen_server process crashes (e.g. I am causing intentional exit with throw in one of handle_call functions), the local supervisor does not restart it (and neither does supervisor on other nodes), which is weird. Is there something that needs to be configured for this classical supervisor restarting of a process to work?

Related to the last question, the registry shows the process as still running:

iex> ProcessHub.process_registry :my_hub
%{
  my_process: {%{
     id: :my_process,
     start: {My.Process, :start_link, []}
   }, ["n2@macbook": #PID<0.457.0>]}
}

...but actually the process is dead, and calling it via this pid above results in:

iex> My.Process.list
** (exit) exited in: GenServer.call(#PID<0.457.0>, {:list, 1}, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.16.3) lib/gen_server.ex:1114: GenServer.call/3
    iex:3: (file)

And this is how I am looking up the process internally on each gen_server call:

  defp my_pid() do
    result =
      ProcessHub.process_list(:my_hub, :global)
      |> Keyword.get(:my_process, [])
      |> Keyword.values()
      |> List.first()

    case result do
      nil -> throw("Process not started")
      pid -> pid
    end
  end

After a big of debugging with Logger I can see that the process actually is restarted locally as it should be, but it seems that ProcessHub's registry still has the old pid, so all the messages go to an old/died process.

Thanks in advance for the answers!

alfetahe commented 2 months ago

Hello,

Thank you for reporting the bug. I have fixed the issue with the process registry synchronization in the recent release, version 0.2.6-alpha.

Now, to address your questions 1 and 2: When using the ProcessHub.Strategy.Redundancy.Replication strategy in active/passive modes, only the involved processes know which one is active and which one is passive. Other processes do not know this unless they query the processes themselves to determine their status. The macro for the GenServer simply stores some data in the process loop, including the status.

I have considered making this status visible to other processes by including it in the process registry's metadata section. This may be implemented in a future release, but it is not guaranteed.

What I recommend is setting up a separate hub for your cluster globals. With ProcessHub, you can start multiple hubs.

For example, you can create one hub for your cluster's global services and another for global workers:

def start(_type, _args) do
    children = [
        ProcessHub.child_spec(%ProcessHub{hub_id: :services}),
        ProcessHub.child_spec(%ProcessHub{hub_id: :workers})
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
end

To get the PID of a specific service worker, you can use:

ProcessHub.get_pid(:services, :user_service)
#PID<0.228.0>

The get_pid/2 and get_pids/2 functions are new API additions to simplify querying the process registry.

I hope this helps!

bjuretic commented 2 months ago

Hi Anuar,

nice to meet you and thanks for the help. I can confirm that your fix solved the problem.

Also, get_pid and get_pids are useful, thanks.

I do have another thing which immediately came up. Some processes need to start right away on node startup. The logical place to put this in is Application.start but it is a bit clunky, as ProcessHub.start_child has to go after the Supervisor.start_link for the main application itself.

    sup_result = Supervisor.start_link(children, opts)

    ProcessHub.start_child(:my_hub, %{id: :sessions, start: {Sessions, :start_link, []}}, async_wait: true)

    sup_result

Is this the proper way to start it?

Thank you in advance!

alfetahe commented 2 months ago

Currently, this is the only way to start the child processes. However, this seems like a low-hanging fruit to solve, and I will likely add a new feature in the next release so child processes can be added statically, similar to how we add child processes with regular Supervisors.

I also noted your example code:

ProcessHub.start_child(:my_hub, %{id: :sessions, start: {Sessions, :start_link, []}}, async_wait: true)

You're using the async_wait option but not actually waiting for the response with ProcessHub.await(), meaning the caller process will receive unhandled messages in its mailbox. You should either remove async_wait: true or wait for the child startup. I will make this clearer in the documentation.