replication - Githubissues

juliangruber commented 11 years ago

@gedw99: "I am building a 3D cad modelling system and tons of json data I need to store on the servers in many data centers. I run offline using indexdb and so need to also sync.

Originally I used pouchdb and couxhdb.

But I want to replace all of it with level dB."

what's the merge strategy?
will it be master-only?
how is your topology?

ghost commented 11 years ago

Merge strategy will need to use difference engine based on date time stamps between client and server as well as server to server. Same merge strategy makes sense because its peer to peer based. This is possible because all data changes are time stamped and kept. This is because its like an edit list and he user can always go forward or back in time on the edits in the CAD software. SO its makes the data replication easy.

NOT master only. Peer to peer multi master.

The topology is client to server, server to server and maybe client to client. I say maybe because i doubt that ONLY JavaScript running in a browser will work based for the various security implications, but it can certainly work with a replication token supplied from the server

juliangruber commented 11 years ago

please avoid doing so many typos, your text is difficult to read

ghost commented 11 years ago

ok fixed it up. sorry

juliangruber commented 11 years ago

So for every datapoint of two merging bodies you accept the one with the newest timestamp. That's easy, that is exactly what scuttlebutt / crdt will do for you. Or do you have any special requirements to the underlying datastructure?

ghost commented 11 years ago

Yes this is true. I take the latest. I read a little about scuttle butb but have not used it in anger yet.

i also have to store images for the CAD materials The image file i would want to also save in the DB. This is dumb i know from a speed point of view but its needed because i need the images to be saved in the off line database client side.

do you know if scuttle butt works client side on top of index db ?

server to server replication from the point of view of server to server multi master i need to make sure i have 3 copies in each data center not sure what is appropriate here. I am wondering if there are any levelup modules that handle this for me. ?

data level security i will store the user ID against the data. simple.

dominictarr commented 11 years ago

sweet, this is super cool!

I have also been thinking about a cad system. I've built a few boats, such as http://www.flickr.com/photos/dominictarr/sets/72157594180332221/

And so it's basically inevitable that one day i'll need to write my own boat design software, then use it to design a boat.

So, you are basically gonna need Vector data, correct?

There isn't any scuttlebutt for vector data yet, but there certainly could be. (Scuttlebutt itself is a Super Class, that handles the replication part, see links in the repo)

I also have not yet implemented a scuttlebutt that has roll-back/checkout/undo - but it would basically be a matter of keeping more history -- fairly simple.

what kind of data structures are you planning? I am very happy to help figure out something that will work well with replication!

see also https://github.com/dominictarr/snob <- this uses a more git-like architecture, which may suit your application, (this can probably be updated to fit a scuttlebutt type model, but will be more work than just using scuttlebutt)

ghost commented 11 years ago

hey Dom,

this s great news.

i should also mention that we are a non profit in Germany. Our main thing is Biomimicry. So we are really trying to change the world with this thing.

For me this all came about because i wanted to make building easier and to be abe to actually build generative houses.

So the CAD system is generative as well "traditional drawing". This means we also need to hold code in the database too. Both JS as well as binary. The binary is run on gpU's using webcl i can discuss more of that later.

As far as data structures. Yes its vector, but also need to hold binary images too. All needs to be part of the database.

but Snob looks damn useful. CAD and version control are often lumped together in a not too nice marriage. so i very much like having version control in from day one.

Then there is CRDT None destructive editing is also one way of doing version control too in Design systems. I assume you can go back to a point in time with it. You might think of it as a repo where every change is a snapshot, But can you merge designs with it like we do on github with your SNOB ?

So i wonder trying to work out if SNOB or CRDT is the best one ?

navaru commented 11 years ago

@gedw99 can you provide some real json as an example, I wonder how the data model looks. Your project sounds interesting, is it opensource? website?

dominictarr commented 11 years ago

@gedw99 sorry for the late response!

CRDT and SNOB have different use-cases,

CRDT is designed for the case where you don't need the full history, although it would be possible to add backtracking - I have considered this, I just havn't had a usecase for it yet.

SNOB is the same architecture as git, with branches and merges and that stuff, only you have pluggable diff tools instead of only handling text. If you can write a diff, patch, diff3 operations for your data structure, then snob can version it.

I started with snob, but the realized that most applications where much simpler, and wrote crdt and scuttlebutt.

What do you mean by generative?

ghost commented 11 years ago

thanks Dom for the explanation

SNOB sounds good, but i doubt i can do real time updates of 2 users looking at the same data with it ?

Do a goolge image search of "generative architecture" and you will see what sort of things we can print and make with it. As an architect i reject that everything we make and surround ourselves with must be orthogonal. 3D printng and 2D cnc techniques have opened the door to making non orthogonal things.

On 23 February 2013 00:24, Dominic Tarr notifications@github.com wrote:

@gedw99 https://github.com/gedw99 sorry for the late response!

CRDT and SNOB have different use-cases,

CRDT is designed for the case where you don't need the full history, although it would be possible to add backtracking - I have considered this, I just havn't had a usecase for it yet.

SNOB is the same architecture as git, with branches and merges and that stuff, only you have pluggable diff tools instead of only handling text. If you can write a diff, patch, diff3 operations for your data structure, then snob can version it.

I started with snob, but the realized that most applications where much simpler, and wrote crdt and scuttlebutt.

What do you mean by generative?

— Reply to this email directly or view it on GitHubhttps://github.com/rvagg/node-levelup/issues/71#issuecomment-13979266.

Contact details: +49 1573 693 8595 (germany) +46 73 364 67 96 (sweden) skype: gedw99

dominictarr commented 11 years ago

@gedw99 both snob and crdt are realtime!

Aha, generative architecture is what the name suggests! This is really cool!

The tricky part in data replication is handling the case where two users have updated the data concurrently. "concurrent" means something like "the same time", but relates to the synchronizations between the users rather than the clock time. So if two users make an update starting from the same version, they create two parallel versions.

Snob and Crdt use two different approaches to merging these parallel versions.

Hmm, it's probably easier to get started with crdt - hmmm, I think you could probably port something that works with crdt to snob. It all depends on what the data structure looks like.

ghost commented 11 years ago

I will look into both.

I have about 10 year experience programming. I know the patterns and theories. I just wanted to know what are the real world low level difference from you ecause you wrote it.

I will use CRDT as the operational transformation and see how ti goes.

t this stage we are working on the WebCL aspects for the CAD Kernel.

These guys have really cracked it. http://www.hastaladesign.com/?cat=22968

do a video search for "Softkill Design"

The apprahc they are taking is based on biomimciry. It will be very successful and realy solve many problems in the world. The biochemists will be busy though. Need organic based polymers now to make them cheaper.

On 23 February 2013 02:21, Dominic Tarr notifications@github.com wrote:

@gedw99 https://github.com/gedw99 both snob and crdt are realtime!

Aha, generative architecture is what the name suggests! This is really cool!

The tricky part in data replication is handling the case where two users have updated the data concurrently. "concurrent" means something like "the same time", but relates to the synchronizations between the users rather than the clock time. So if two users make an update starting from the same version, they create two parallel versions.

Snob and Crdt use two different approaches to merging these parallel versions.

Hmm, it's probably easier to get started with crdt - hmmm, I think you could probably port something that works with crdt to snob. It all depends on what the data structure looks like.

— Reply to this email directly or view it on GitHubhttps://github.com/rvagg/node-levelup/issues/71#issuecomment-13982431.

Contact details: +49 1573 693 8595 (germany) +46 73 364 67 96 (sweden) skype: gedw99

dominictarr commented 11 years ago

@gedw99 Okay, Great!

I'm super-busy until monday, but after that I'll write up a wiki page about the differences between the replication approaches.

ghost commented 11 years ago

thanks mate - I will look at it.

Gerard

On 23 February 2013 02:47, Dominic Tarr notifications@github.com wrote:

@gedw99 https://github.com/gedw99 Okay, Great!

I'm super-busy until monday, but after that I'll write up a wiki page about the differences between the replication approaches.

— Reply to this email directly or view it on GitHubhttps://github.com/rvagg/node-levelup/issues/71#issuecomment-13982897.

Contact details: +49 1573 693 8595 (germany) +46 73 364 67 96 (sweden) skype: gedw99

navaru commented 11 years ago

I think you'll need a CRDT based approach since you need to handle specific operations.

I've learned Operational Transformation a year ago, so in order to apply OT to a document editor you'll define:

A document represents a string of characters, but when it comes to OT, a document is a list of changesets.
A changeset is group of edits made within a certain time by one user (~500ms), that may be canceled or propagated as
Operational transformation deals with actions (operations) that will be performed on the document. An operation is a sequence of changesets (operation components)

There are two types of OT:

primitive operation model (most implemented model)
- string-wise: insert, delete, update
- map app specific logic to primitives
app-specific operation model (has a more complex transaction layer)
- n different operations => n * n transformation functions

OT has the downfall that you need a server to handle the transactions, no peer-to-peer.

I'm still researching CRDT.

How complex is your data model?

Anyway, biomimciry is awesome!

ghost commented 11 years ago

thanks Eugen

Ok, this pretty much aligns with what i have learnt too about OT patterns. i too still do not fully understand how that differs from CRDT.

There is a interesting group that have written a system that does OT patterns but not using OT Called substance.io. They have it working well.

P2P is not vital. Was just a nce to have. Authoritive Server can do the marging transactions. .

I built a sterolithographic printer last year and layered around with different materials for printing houses. Its a fast process, but the photopolymers are expensive, and break down over time. You can mix carbon nanotube emulsions in them to help. But they will still break down. The key is finding an organic material.

Right now i am playing around with light scribe machine and production of graphene. I intend to use this process to make 3d structures out of graphene.

the great thing is that it also makes a great battery. Handy for anything you want to make, since the structure itself is the battery.

G

On 23 February 2013 03:01, Eugen Tudorancea notifications@github.comwrote:

I think you'll need a CRDT based approach since you need to handle specific operations.

I've learned Operational Transformation a year ago, so in order to apply OT to a document editor you'll define:

A document represents a string of characters, but when it comes to OT, a document is a list of changesets.

A changeset is group of edits made within a certain time by one user (~500ms), that may be canceled or propagated as

Operational transformation deals with actions (operations) that will be performed on the document. An operation is a sequence of changesets (operation components)

There are two types of OT:

primitive operation model (most implemented model)

string-wise: insert, delete, update

map app specific logic to primitives

app-specific operation model (has a more complex transaction layer)

n different operations => n * n transformation functions

OT has the downfall that you need a server to handle the transactions, no peer-to-peer.

I'm still researching CRDT.

Anyway, biomimciry is awesome!

— Reply to this email directly or view it on GitHubhttps://github.com/rvagg/node-levelup/issues/71#issuecomment-13983129.

Contact details: +49 1573 693 8595 (germany) +46 73 364 67 96 (sweden) skype: gedw99

Level / community

replication #50