Closed mszmurlo closed 3 years ago
I just got bitten by the same issue.
I couldn't find a genserver I had just started when it was on a different node as the caller. I had no issue if the caller and the callee were on the same node, or if the callee was a long running process started a while ago. The replication latency explains it, and I was able to confirm that this is the issue by simply waiting a bit after starting the remote genserver.
Actually, since we get the pid of a child after starting it, we can manually register its name and it solves our issue:
case Horde.DynamicSupervisor.start_child(
MyHordeSupervisor,
{__MODULE__, my_args}
) do
{:ok, pid} ->
Horde.Registry.register_name(
{MyHordeRegistry, my_name},
pid
)
:ok
{:error, error} ->
{:error, error}
end
@mszmurlo This is how Horde works, which is documented here: https://hexdocs.pm/horde/eventual_consistency.html#content
@mszmurlo This is how Horde works, which is documented here: https://hexdocs.pm/horde/eventual_consistency.html#content
@derekkraan Right. It's just that the delay seems very important. It would be very useful at least to have a choice between "synchronous" registration that exits only once the information had been dispatched on all nodes and the current version where we can't be sure that all nodes are aware of the info (but I believe most of the time that's not important)
@mszmurlo I understand that you might find this useful, but Horde is not built this way, and I have no plans at this moment to support this feature. I also don't think it's quite as useful as you think it is, as most problems that people experience are related to new requests coming in while the information is still propagating across the cluster, and race conditions associated with that (can not find process, but when starting process, all of a sudden the information has propagated and now you get an error because the process can be found).
I will also note that you can adjust the sync_interval
by passing the :delta_crdt_options
option, documented here: https://hexdocs.pm/horde/Horde.DynamicSupervisor.html#t:option/0, if 250ms or whatever the default is, is not fast enough for you.
Hi,
I'm not sure whether this is a real issue or a bug of mine because of a misunderstanding on how Horde works. I posted that question on Stack Overflow; the question got upvoted a couple of times but I got no answer so I believe I'm not the only one who misunderstood something.
Basically, I would have thought that calling
Horde.DynamicSupervisor.start_child(...)
was synchronous, that is it returns only once all dispatching on all nodes of the cluster was done, as it is supposed to have the same behavior as the standardDynamicSupervisor
. However, I've observed the following:Horde.DynamicSupervisor.start_child(myGenServer)
on node N1 and immediately after I call an API on thatGenServer
GenServer
had been physically started on the other node, N2,Then
Horde.Registry.lookup(via_tuple(id))
doesn't find it before about 250ms on average. My timing measures a range between 150ms to 350, but of course this depends on the system. TheGenServer
becomes available in the registry after.If
Horde.DynamicSupervisor.start_child(...)
is asynchronous, that's annoying as we don't know when it will return so I ended by implementing a functionget_pid()
which actually waits until theGenServer
is available (but I consider this as quite dirty)Here comes an example of what I get on the log when I call a simple
ping
function:which shows the client had been waiting 310ms to get the
pid
. That's a long latenecy, right ?If the call to
start_child()
is supposed to be synchronous, well, either horde has a bug as it returns before all dispatch had been done or, well, I have config error or a bug but I've spent a week on this without being able to spot it.Just to give you an idea of what I'm doing:
UserAgent
, UA, (implemented as aGenServer
).login()
controller where I create the UA and need to update it's state:Cheers