dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.74k stars 505 forks source link

Peer-to-peer dolt server? #629

Closed canadaduane closed 1 year ago

canadaduane commented 4 years ago

One of the use cases we have in mind is a cluster of interdependent services with shared data. We'd like each service to be able to make a change, then submit changes to various "peer" services in the cluster.

I see there is a way to add file-based, AWS, and GCS remotes. Is there also a way to create a peer-based remote of some kind?

reltuk commented 4 years ago

This is a neat idea, but we don't currently have plans for something like this. In the above scenario, each service in the cluster has all the other "peer" services as unique remotes in their dolt configuration, and when they make updates, they push them to each remote? And they're also able to receive pushes from their peer's and will see updates that way as well?

The underlying primitives probably aren't too far off from making something like this possible, but it's sufficiently far afield from the way things are currently structured that it might be a little bumpy. In particular, a dolt workspace is more than just the chunk store which enables remote storage – a dolt remote is able to store branch refs and the merkle dag of commit and table data, but the workspace itself also stores things like the list of remotes (with authn settings), the currently checked out branch, the value of the database in the working set if it's moved from the branch ref, etc.

Another way to meet a P2P use case would be for everyone to rendezvous at central storage in something like an aws remote. Obviously that's not exactly the same, but with non-overlapping fetch specs you could even have subsets of the clients (effectively) not communicating with other subsets of the clients...one of the big differences with the centralized storage seems to be full-mesh versus more complicated communication topologies in the P2P case.

Does that help to provide some guidance? Sorry to disappoint :).

canadaduane commented 4 years ago

Thanks for the constructive feedback!

I think noms was headed in this direction (or achieved some of it?) at https://github.com/attic-labs/noms/blob/master/doc/decent/about.md It appears the "noms-chat" demo is a p2p test case of this type of topology.

reltuk commented 4 years ago

Definitely noms headed further down this path. Noms is structured so that it's easily embeddable as an application library, and they added the ability for the application itself to bring it's own chunkstore (content addressed storage with a check-and-set operation on a small metadata payload containing root pointer basically). The chat samples linked to from that documentation page definitely worked at some point, but they ended up removing the ipfs chunkstore implementation here: https://github.com/attic-labs/noms/commit/24dc7ad17000f04cf08179dc2973b0f24d13f525. Maybe they moved it somewhere else and made it use the external protocols machinery instead? It was a pretty simple straightforward wrapper around the chunkstore interface in any case. Which is all to say – I definitely think the primitives are generally in the right place, and it's possible noms + ipfs can actually get pretty close. The idea seems super cool, but we're definitely intensely focused on the git-workflow-like use cases for now.

canadaduane commented 4 years ago

Understood, thank you!

zachmu commented 3 years ago

This is something we will be revisiting over time, but for now the guidance remains to use a centralized remote as the coordination point for multiple clones / forks.

Dolt does also ship with a file-based remote http server, and you can set up file-based remotes with no server at all. So it's possible to do this without involving DoltHub or any cloud services.

timsehn commented 1 year ago

I'm going to resolve this. I don't think it's worth tracking as an issue.