Open bitwalker opened 8 years ago
I have noticed some failures in swarm when ITCs are compared. I realized that the Clock.peek() operation is broken, returning an invalid structure which is not compatible with the other operations. Which means that peeked clocks cannot be compared or joined. I put up a pull request with a fix: https://github.com/bitwalker/swarm/pull/77
While I was investigating this problem, I reviewed the general use of ITCs and it seems like the current implementation is flawed.
Here are a few rules which have to be implemented to make ITCs work:
Also peeked clocks should never be stored because otherwise the identity gets lost: e.g. https://github.com/bitwalker/swarm/blob/e88966cad465477eb935b96c83e13c080b1a138b/lib/swarm/tracker/tracker.ex#L654
Use Clock.join() to receive causal information from remote (peeked) clocks. Clock.join() is not used anywhere in the Tracker, which means that causal information from other nodes is never updated after a broadcast. In order to use Clock.join() with peeked clocks, the ITC implementation has to be fixed first.
Always call Clock.peek() before sending clocks to other replicas. This avoids a couple bugs when the local node stores the wrong clock identity.
Make sure that every node gets a properly forked clock when joining the cluster. This is actually the most complicated piece to fix.
Let's take a look at an example (handle_replica_event for :update_meta):
I think, the code should look like this:
Clock.leq(lclock, rclock) ->
# Join the peeked remote clock to take over causal information from the other node
nclock = Clock.join(lclock, rclock)
# Update the metadata and clock
Registry.update(name, [meta: new_meta, clock: nclock])
Clock.leq(rclock, lclock) ->
# This node has data which is causally dominating the remote
# Ignore the request when the local clock wins
:else ->
# we're going to take the last-writer wins approach for resolution for now
new_meta = Map.merge(old_meta, new_meta)
nclock = Clock.join(lclock, rclock)
nclock = Clock.event(nclock)
# we're going to join and bump our local clock though and re-broadcast the update to ensure we converge
Registry.update(name, [meta: new_meta, clock: nclock])
debug "conflicting meta for #{inspect name}, updating and notifying other nodes"
broadcast_event(state.nodes, Clock.peek(nclock), {:update_meta, new_meta, pid})
I found this issue because there is one process in our swarm of 3 nodes that does not start and we get this warning:
09:11:40.932 [warn] [swarm on node1@49800a15e1cc] [tracker:handle_replica_event] received track event for "device-1234", mismatched pids, local clock conflicts with remote clock, event unhandled
We have 3 nodes running, but this warning only occurs on one (49800a15e1cc
) of them.
Strange is that Swarm.whereis_name
does always return the same PID where the according process is not running (anymore).
Is this a problem that will be fixed by swarm? Is there a workaround for this?
I've got some pending work which addresses the usages of clocks inside Swarm, but I'm currently in the middle of a big move, so I've had to shelve stuff for a bit until I get settled in my new house - this is definitely on my list of priorities!
This issue is starting to occur more frequently at our system. Hope it can be fixed with the next release so we can keep using this nice package :)
We're running into the same warning/error as @h4cc.
I'm currently working on a new causally consistent model for Swarm, but in the meantime I believe we recently merged a PR which included fixes to the clock usage, you may want to give that a shot until I can get a release out.
To recap, the tracker uses an implementation of an Interval Tree Clock for resolving the causal history of replicated events. My understanding of ITC is that a scenario plays out like so:
In the case of Swarm, we can deterministically resolve conflicts because at any given point in time, we know which node in the cluster "owns" a given process/registration, with the exception of processes which are registered but not managed by Swarm. We can ask the conflicted process to shut down, or kill it, and the registration will remain with the "correct" version. However, since the way synchronization works today is that the registry is sent to the remote node, I'm not sure a clock is strictly needed. If we assume that it does help us though, we should ensure that our logic for handling the clock forking/peeking/incrementing is correct.