Closed msw10100 closed 6 years ago
I made the following patch to tracker.ex, which resolved the issue for me:
diff --git a/lib/swarm/tracker/tracker.ex b/lib/swarm/tracker/tracker.ex
index 40d846b..dd318db 100644
--- a/lib/swarm/tracker/tracker.ex
+++ b/lib/swarm/tracker/tracker.ex
@@ -1137,6 +1137,10 @@ defmodule Swarm.Tracker do
warn "#{inspect name} could not be started on #{remote_node}: #{inspect err}, retrying operation after #{@retry_interval}ms.."
:timer.sleep @retry_interval
start_pid_remotely(remote_node, from, name, meta, state, attempts + 1)
+ {:error, :undef} = err ->
+ warn "#{inspect name} could not be started on #{remote_node}: #{inspect err}, retrying operation after #{@retry_interval}ms.."
+ :timer.sleep @retry_interval
+ start_pid_remotely(remote_node, from, name, meta, state, attempts + 1)
{:error, _reason} = err ->
warn "#{inspect name} could not be started on #{remote_node}: #{inspect err}"
reply(from, err)
Is there a better approach that I should consider? Or should I make a PR for this?
Thanks for the bug report and patch @msw10100.
Can you create a pull request for the change and I'll get it merged in?
Fixed by #56.
Using Swarm 3.0.5, often when I have processes on one node and a new node joins, causing processes to move around to a different node, I will see a warning on the originating node:
and on the target node:
So it would appear that since Swarm loads before our application modules (Gptest, above), the VM has not yet loaded Gptest into memory when the Swarm attempts to start it on the new node. And since there are no retries when that error occurs, the process stays down until some external entity restarts the process.
Is there a mechanism that I could use to either delay the attempt to load the process or retry when I get this error?