Closed Raynos closed 11 years ago
@dominictarr scuttlebutt is leaky since the vector clock grows forever.
It's as close to scuttlebutt/model as in has the exact same API surface.
But when you set objects they get merged instead of overwritten (crdt/Row
style).
And there's aggressive cleanup in the key space and in the vector clock.
Right... well, I wouldn't call that exact same, because the contract for model.set is different.
The vector clock only grows if the number of nodes that write to it grows. like, if you have thousands of nodes that write to a document, then the vector clock grows, sure.
The user needs to not do that.
the default scuttlebutt does not manage that for you, but see udid for a good way to reuse your node ids, preventing this problem.
merged into 5.3.3
@dominictarr say you have a scuttlebutt that contains the list of all nodes currently in a network topology. It's in memory on the server.
A shit ton of nodes are going to connect to it and ask for the list and add themself to it. Unless you kill / restart the process that vector clock only grows.
This is why it's important to understand the way that your what happens to your data physically. You can't just believe in abstractions.
The vector clock only needs to grow as large as the number of nodes,
if you make the id persistent.
set this my going var s = new Scuttlebutt(id)
(or whatever Scuttlebutt subclass you want)
or set s.id = id
immediately after you created the instance.
For this particular usecase, if you used ip:port
as the id, then the vector clock is all the data you actually need.
you could just use a hash of port:id -> heartbeat
and not even use scuttlebutt.
Here is a short example of exactly that! https://github.com/dominictarr/repred/blob/master/examples/peers.js although, it doesn't remove dead items in the example, that would about 3 lines...
How many nodes are you intending to connect to?
A p2p topology should support millions of nodes. With a centralized expiry model containing the list of nodes in the topology, it would for example contain the most recent 500 nodes and that's your bootstrapping list to join the topology.
Expiry model is designed to not grow in memory for this use case.
Yes, it expiring records works in this situation, because a gossip protocol will still work with approximate data.
can we change the text? Unless we explain in detail what "non-leaky" specifically means, then I feel it begs too many questions. I'd rather it just said "memory capped model with expiring keys"
Scuttlebutt isn't leaky if you stop adding keys. the thing here is that expiry-model is designed for adding keys forever.
Also, how close is this to scuttlebutt/model?
since model is such a generic word, it you should turn it into a link if it's close, to disambiguate.