dat-ecosystem / dat

:floppy_disk: peer-to-peer sharing & live syncronization of files via command line
https://dat.foundation
BSD 3-Clause "New" or "Revised" License
8.24k stars 449 forks source link

ship the dat beta #195

Closed max-mapper closed 9 years ago

max-mapper commented 10 years ago

we're shooting for a dat beta release in april 2015. note that this doesn't include all the stuff in the overall project roadmap

here's a link to the list of issues for the beta milestone

These are the repos where most of the work will be happening:

We are seeking feedback on our Beta APIs:

CLI: https://github.com/maxogden/dat/blob/beta/beta-cli-api.md JS: https://github.com/maxogden/dat/blob/beta/beta-js-api.md

use this thread to discuss anything that doesnt fit in one of the issues above

ekg commented 9 years ago

@mafintosh @maxogden and I had a conversation a few days ago about data models for the beta release. We're interested in satisfying a number of constraints in order to support several important features. A few stand out:

I'll go over the rough implications of these for the model so that the proposed changes can be properly discussed and grokked. First, it's worth describing the data model as it stands in dat alpha.

As of 68ae983c7, the data model in level-dat (dat's default backend) consists of three components.

  1. The current table: this is where the current state of the data can be quickly found. dat keeps reference to the current version of every key in the data table here. If you dat cat, dat will use this table to quickly return the current version of the data.
  2. The data table: this is where data lives. The primary key picked on import is used to generate a key, but the key also stores the version number of the object, which is the minimum that is required for version control of objects with the same key. The current format allows for namespaces. Only the default ('') and "internalschema" appear to be used currently.
  3. The log or change table: this is where dat records the series of changes that occur to the objects (rows) in the data table.
tablekeyvalue
current +c
+namespace
+key
version
data +d
+namespace
+key
+version
object (encoded in protobuf format)
log/change +s
+log_id
[log_id,
key,
from_version,
to_version,
namespace]

Note that "+" is "ÿ" (aka \xff). You might see this if you're checking out the format via something like superlevel .dat/store.dat createReadStream.

The four constraints (multi-master, checkout/rollback, data integrity and efficient synchronization, and forking/merging) suggest a particular set of changes. One approach would be as follows:

tablekeyvalue
current +c
+namespace
+key
log_id
data +d
+namespace
+key
+log_id
+parent_hash
+branch
object (encoded in protobuf format)
log/change +s
+log_id
[log_id,
key,
[parent_log_ids,...],
[parent_hashes,...],
hash(new+[parent_hashes]),
ns,
branch]

Appending the log_id in place of the version in the data table allows us to quickly rollback, and also the last key in ordered data stores (like leveldb) will correspond to the most-recently-seen version. A coherent sequence of versions can be generated over the ordered history for each object rather than stored and directly-manipulated by users.

We extend the key space to include branches, but it isn't clear to me where these should fall in the keys. For instance, they could be appended behind the namespace, but would limit the number of branches/masters that could be handled efficiently because each lookup would require appending all of the branches/masters that the repository was aware of.

We record the parent_hash (the preceeding hash in the Merkle tree for this object) as well as the source branch (or repo UUID) for each entry in the table. By recording the hashes in the Merkle tree, as well as the local log_ids of the parents of each object, the change log can be used to traverse the series of forks and merges through which each object's history passes.

What am I missing? How does this break? We're not going to be able to solve this without implementing and testing things, but hopefully a little discussion can get us closer to something generally optimal before a lot of time goes into testing various options.

max-mapper commented 9 years ago

just an update, major work is underway now in both http://github.com/maxogden/dat-core and https://github.com/maxogden/dat/tree/beta. we hope to release in the new couple of weeks

sckott commented 9 years ago

:rocket:

webmaven commented 9 years ago

:shipit:

joshmarinacci commented 9 years ago

any updates?

max-mapper commented 9 years ago

beta branch has been merged into master branch. still some work to do before a release, but for now you can npm install dat@7.0.0-pre to test out