derekkraan / horde

Horde is a distributed Supervisor and Registry backed by DeltaCrdt
MIT License
1.32k stars 106 forks source link

Registry.lookup doesn't always return new item that was just registered #267

Open jweinkam opened 9 months ago

jweinkam commented 9 months ago

I have in my application start

{Horde.Registry, members: :auto, keys: :unique, name: MyRegistry}

For the code

Horde.Registry.register(MyRegistry, "name", "value")
result = Horde.Registry.lookup(MyRegistry, "name") 

I would expect result to be [{#PID<...>, "value"}] But most of the time I get []

jweinkam commented 9 months ago

This is using v0.9.0. Version v0.8.7 always gave the expected result. I believe this is related to the change in https://github.com/derekkraan/horde/commit/053e1aea33dc19421b175071b59bc615eec7225b

derekkraan commented 9 months ago

This is intended behaviour. In local cases, it is expected that you will know the pid of the process (that you have just registered). Is this possible in your case? (Otherwise I would like to know more details)

jweinkam commented 9 months ago

I am attempting to do

    GenStateMachine.start_link(__MODULE__, args,  name: {:via, Horde.Registry, {MyRegistry, "name"}})

This is frequently failing with error message

{:error, {:process_not_registered_via, Horde.Registry}}

Since internally in GenStateMachine.start_link it does a register, followed immediately by lookup and get [] instead of [{#PID<...>, ...}]

derekkraan commented 9 months ago

Thanks for the clarification. Usually I would expect that you wouldn't register and then immediately do a lookup, since theoretically to register you need the PID anyways, but it seems that :gen_statem(?) is doing exactly that.

Do you have a link to the location in the source where this is happening?

I don't have an immediate solution to your problem. I would suggest for now to stick with v0.8.7.

jweinkam commented 9 months ago

GenStateMachine.start_link, call :gen_statem.start_link, which call :gen.start(?MODULE, link, Module, Args, Opts), when eventually calls :gen.init_it(GenMod, Starter, Parent, Mod, Args, Options). This is in gen.erl at line 188. It calls to register, then calls GenMod:init_it (where GenMod = :gen_statem). In :gen_statem.init_it the first this it does is gen:get_proc_name which calls lookup. This is gen_statem.erl at line 1015. So the lines are side by side, but the calls are very quickly one right after the other.

arjan commented 9 months ago

Maybe we should let register block until the pid has propagated in the crdt? Although that might make things slow if the sync interval is high.. 🤔

derekkraan commented 9 months ago

Calling Horde.Registry.register/3 does a GenServer call into Horde.RegistryImpl, which calls into DeltaCrdt.put/4, which results in a GenServer.call/3 to the DeltaCrdt instance. Before it returns, DeltaCrdt does a few Kernel.send/2, back to the Horde.RegistryImpl.

So in theory, these messages have all been sent before Horde.Registry.register/3 returns, which should mean that they are received by Horde.RegistryImpl before the next call.

It is confusing to me that this appears not to be the case. I thought that sent messages were always received in order.

jweinkam commented 9 months ago

The registry lookup doesn't do a call but instead just reads from ets. Thus the read for ets can happen before the other sends are processed.

derekkraan commented 9 months ago

Oh yes that is true. I am also now reading that message ordering is only preserved between two processes, so if a third process is involved, the messages can be interleaved.

jweinkam commented 9 months ago

I am able to work around the problem by not specifying name: {:via, Horde.Registry, {MyRegistry, "name"}} and instead having my init call Horde.Registry.register(MyRegistry, name, self()). This results in bypassing the register, lookup during gen_statem start, while still getting the process registered so that calls can be made using {:via, Horde.Registry, {MyRegister, "name"}}.

nifoc commented 7 months ago

I'm sorry for what is essentially a +1 comment, but I've just discovered the same issue.

I too am using gen_statem and a via tuple. Everything worked with the plain old Registry, so I expected Horde.Registry to be a drop-in replacement.

philipgiuliani commented 6 months ago

I've also stumbled on this issue today when debugging a crash in our system. We are also having it with gen_statem.

derekkraan commented 6 months ago

We could potentially put a second ets table in front of the existing one, to store new local registrations in. Then we could expire them after some very short period of time (milliseconds). This would cover the gap between registration and propagation of the registration back to the local registry.

We only need to cover the amount of time it takes for the message to be sent to the local process and recorded in the real ETS table. Unless your process is flooded with messages, this will be very quick.

I'm still mulling over how good of an idea this is. [expectation management] I don't have any time to work on this at the moment.