Closed Ivor closed 5 years ago
Hi,
If you want to share information between nodes, you can do this with for example Horde.Registry.register/3
(and update with Horde.Registry.update_value/3
) or Horde.Registry.put_meta/3
. This information will be automatically synced and available from all nodes.
The other option is of course to use DeltaCrdt yourself directly to sync data between processes. This is a bit more complicated of course, but I have written a blog post that outlines how this works: https://medium.com/@derek.kraan2/how-deltacrdt-can-help-you-write-distributed-elixir-applications-dc838c383ad5
The documentation for DeltaCrdt is also fairly good, but if you feel there is something missing then please let me know: https://hexdocs.pm/delta_crdt/DeltaCrdt.html
I'd welcome a PR on the example app with the change you mentioned.
Actually come to think of it, register
and update_value
won't be of much use for this use case, since the values will be cleaned from the registry when the process dies. put_meta
will be more useful for this use case.
Hi Derek.
Thanks for the response. And thanks for writing a seemingly very useful library. I am keen to use it. I will have a look at put_meta
. To be honest I didn't think that Horde.Registry did anything related to sharing data between services. It might be the thing I am looking for.
I did read your post on DeltaCrdt. My problem with that was figuring out how to share the crdt1, crdt2 etc across nodes. In your example code you create a few CRDTs in the console and show that data is sync'ed between them. My problem was figuring out how to share the reference across the members of the horde. How do you run add_neighbours
when you don't have reference to the crdts created on the other nodes.
I tried defining DeltaCrdt's in my supervisor after looking at what you did in Horde
itself. To be honest, I don't fully understand what you are doing in Horde
(I am a noob) and as the rabbit hole started getting deeper I started to suspect that I might be missing the plot. Hence the question.
If I can figure out how to adapt the example app to share the counter I will definitely make a PR for that.
We're looking into using Horde and DeltaCrtd in production to so if I can get this working I will be able to implement in production and let you know how it goes.
I can already see that Horde
will be good to prevent a periodic archiving function from running on multiple nodes. If I can share data between the nodes then it will have a broader application for us.
Thanks again for the time and the project. 🙏
To find the CRDT process on another node, generally you will give the process a name and use a tuple {name, node}
to reference it. You can find an example of this in action here: https://github.com/derekkraan/horde/blob/master/examples/hello_world/lib/hello_world/horde_connector.ex
Horde.Registry mirrors the functionality of Elixir's own Registry, so all of the facilities available there are also available, but then in a distributed manner (Elixir's Registry also provides data storage for processes and registry-wide (meta) storage).
I would try it like this: use put_meta
when you change your value, and use meta
to read the value out in the init
callback of your process, so that it can pick up where the previous process left off if it was killed (or the node it was on was shut down, eg during a deploy).
Glad to hear you're finding a use for Horde :+1:
Hi Derek
PR to share the count value across the nodes that comprise the horde.
https://github.com/derekkraan/horde/pull/44
Do you think using the meta
and put_meta
functions are reasonable to share, for example, a queue of records that need to be processed so that if the instance that is doing the processing goes down the records remain in the queue and can be picked up by the new process being spawned? Do you have an upper bound of how much data one can reasonably expect to share in this way? I'm wondering if this meta data is perhaps only intended for sharing simple values rather than long lists.
Also, if say there is a key-value pair where the value is a long list and then the value of that list changes (removing an element from the list), will the entire new list by synced if you do a put_meta
call with the new list or are Delta CRDTs smart enough to do delta's between the values of the same key.
For example:
data = [1,2,3,4]
Horde.Registry.put_meta(some_horde_registry, "list", data)
# At this point [1,2,3,4] is shared across the cluster.
data = [1,2,3]
Horde.Registry.put_meta(some_horde_registry, "list", data)
What is shared across the cluster at this second put_meta
call?
A delta showing that data
has changed by "-[4]" or is the entire data
value?
I've merged your PR.
To answer your question, DeltaCrdt will sync the entire value of data
when you write it. There is no magic diffing going on here. The "deltas" we sync with DeltaCrdt are equivalent to the operations you can perform (ie, "add key/value", "remove key", "clear" each correspond to a single delta that cannot be further decomposed).
You can create new DeltaCrdts that do different things. For example, an add-wins Set would be possible. Even a Add-wins set wrapped in an Observed-Remove Map is a possibility. I am not aware of any "List" CRDTs though.
You can use Horde.Registry's meta for this, but be aware that the delta will be the size of the total list. This might be acceptable based on how large the list is and how often it is updated.
Thanks Derek.
Much appreciated. I thought that the delta is probably computed based on the mutate
which I think is what put_meta
uses. I think we can engineer a solution that will work with what is possible.
Looking forward to using Horde in Production.
Great. I'm going to close this issue, but feel free to continue the discussion if you have any other questions.
I am loving Horde and learning a lot along the way. I successfully get the
HelloWorld
application to run with three different IEX instances and can see that only one instance ofSayHello
is running. However, when I ask each one the:how_many
the count is different on each node, being the number of times that node has executed the:say_hello
function.I thought it would be simple to extend the example application to work with CRDT and share the count so that
how_many?
would return the total number of times the function has been called in the cluster. It doesn't seem to be as simple and the documentation for CRDT doesn't make it clear how adding neighbours from other nodes will work.It would be great if the example application could be extended to show how the various nodes will use DeltaCrdt to share data between them.