dominictarr / scuttlebutt

peer-to-peer replicatable data structure
Other
1.32k stars 66 forks source link

Mention expiry-model #19

Closed Raynos closed 11 years ago

dominictarr commented 11 years ago

can we change the text? Unless we explain in detail what "non-leaky" specifically means, then I feel it begs too many questions. I'd rather it just said "memory capped model with expiring keys"

Scuttlebutt isn't leaky if you stop adding keys. the thing here is that expiry-model is designed for adding keys forever.

Also, how close is this to scuttlebutt/model?

since model is such a generic word, it you should turn it into a link if it's close, to disambiguate.

Raynos commented 11 years ago

@dominictarr scuttlebutt is leaky since the vector clock grows forever.

Raynos commented 11 years ago

It's as close to scuttlebutt/model as in has the exact same API surface.

But when you set objects they get merged instead of overwritten (crdt/Row style).

And there's aggressive cleanup in the key space and in the vector clock.

dominictarr commented 11 years ago

Right... well, I wouldn't call that exact same, because the contract for model.set is different.

The vector clock only grows if the number of nodes that write to it grows. like, if you have thousands of nodes that write to a document, then the vector clock grows, sure.

The user needs to not do that.

the default scuttlebutt does not manage that for you, but see udid for a good way to reuse your node ids, preventing this problem.

dominictarr commented 11 years ago

merged into 5.3.3

Raynos commented 11 years ago

@dominictarr say you have a scuttlebutt that contains the list of all nodes currently in a network topology. It's in memory on the server.

A shit ton of nodes are going to connect to it and ask for the list and add themself to it. Unless you kill / restart the process that vector clock only grows.

dominictarr commented 11 years ago

This is why it's important to understand the way that your what happens to your data physically. You can't just believe in abstractions.

The vector clock only needs to grow as large as the number of nodes, if you make the id persistent. set this my going var s = new Scuttlebutt(id) (or whatever Scuttlebutt subclass you want) or set s.id = id immediately after you created the instance.

For this particular usecase, if you used ip:port as the id, then the vector clock is all the data you actually need. you could just use a hash of port:id -> heartbeat and not even use scuttlebutt.

Here is a short example of exactly that! https://github.com/dominictarr/repred/blob/master/examples/peers.js although, it doesn't remove dead items in the example, that would about 3 lines...

How many nodes are you intending to connect to?

Raynos commented 11 years ago

A p2p topology should support millions of nodes. With a centralized expiry model containing the list of nodes in the topology, it would for example contain the most recent 500 nodes and that's your bootstrapping list to join the topology.

Expiry model is designed to not grow in memory for this use case.

dominictarr commented 11 years ago

Yes, it expiring records works in this situation, because a gossip protocol will still work with approximate data.