What the hell is dat? - Githubissues

balupton commented 9 years ago

I've been reading the various links and I can't find anything on what the hell is dat? Only things of some high level features mentioned on the dat homepage and then instructions on how to use it.

Is there a talk or something on it?

ErieMeyer commented 9 years ago

lol

dfockler commented 9 years ago

As of right now, it seems like an easily sharable, syncing file format, essentially git for large datasets. It's kind of weirdly structured because it's meant to hook into other scripts and applications to actually do data transformations. It also seems like they made a move to work on sharing scientific data, which was not the first intended purpose. But the docs don't really explain a clear use case, and it's still in alpha so that could have something to do with it.

nichoth commented 9 years ago

https://vimeo.com/114821581

gobengo commented 9 years ago

@balupton http://maxogden.github.io/get-dat/

okdistribute commented 9 years ago

Haha! You made me lol.

There are 3 of us working on it now, and we're in the process of finishing beta. We'll update the website with some tutorials, getting started guides, and stuff like that. Thanks @dfockler @nichoth and @gobengo :)

balupton commented 9 years ago

Thanks everyone, the talk became the best resource for me as it goes into the use case, what it does, when to use it, why it is important, etc etc. The walkthrough isn't that helpful initially, as it is how to do it, rather than the why, whats, or what ifs which is needed prior to making the investment to the hows.

That being said, now that I know what it is, it seems really nifty. Keen to follow this project. Looking forward to the new website and marketing.

nylen commented 9 years ago

I can't quite figure out what the hell it is either... but it's really cool and has amazing amounts of potential.

It seems like a lot of the promised features are not implemented yet. See #296 and #300 for recent examples.

gobengo commented 9 years ago

I found the 'get-dat' walkthrough very useful (and very technically impressive...).

I realized a cool use case, which is as a 'sink' for any script you would run that spits newline-delimitted JSON to stdout. For example, JSON sourced from an API.

Last week I made something like that for work, livefyre-geo-collection. In the near future I'll take a stab at piping it into dat as a way to persist the results of the 'archive' command.

okdistribute commented 9 years ago

@balupton @nylen for some background, I just wrote this whitepaper draft: https://github.com/maxogden/dat/blob/master/docs/whitepaper.md

I'd be happy to receive questions/comments/PRs/suggestions/etc if you have the time! :)

balupton commented 9 years ago

@karissa cool that was useful.

balupton commented 9 years ago

So... what do you do with the data once it is in dat?

It seems like the following:

There are external data sources
You pull these external data sources into dat, dat versions and merges them into a local dat database
You then export the dat database into a json file so you can import the data into a database that you can query and work with

Item 3 here, seems more like it should be:

Dat can then be configured with automatic export of the latest final data to an external database, so you can query and work with the data

Or something along the lines of being able to work with the current latest/final data in our ideal structures for querying/rendering/etc

My actual use case here is to develop a static site generator, which database is a local leveldb or mongodb or pouchdb database, which uses dat to import data into that local database from several external sources (prismic.io, wordpress.com, ghost.org, tumblr.com, soundcloud.com, etc) - allowing for the ability to consolidate a person's online data and render it nicely for an always up to date personal portfolio website. Idea brief.

joyrexus commented 9 years ago

@karissa thanks for the whitepaper. I found it helpful.

I'd like to offer some general feedback: You close the first section with ...

We introduce Dat, a version-controlled distributed database and data tool that has the user interface of a version control system (VCS).

... and follow with a nice summary of key features. You then compare/contrast w/ Kafka. As a result, the reader has an initial mental model of a distributed database and/or messaging system, both of which may be somewhat misleading.

FWIW, I'd suggest that a somewhat better mental model to put right up front might be a CSV file / spreadsheet table ... with row-level versioning. Whenever I try to explain dat I always start with that, because everyone's familiar with a table of data in Excel and can grok the idea of the table's row-by-row change history. Once that sinks in, I mention that the various versions of the table are "clone-able": so, table of data ... that's versioned ... and replicable.

Anyway ... you do a nice job of explaining dat, but I think putting the idea of a tabular-sheet-of-data-with-change-history front-and-center might be useful to newcomers looking for a simple, concrete mental model. Everything else (blob storage, the REST interface, etc.) can hang off that.

okdistribute commented 9 years ago

@joyrexus awesome, thanks for your feedback. I made some quick edits but I'd like to go through at some point to do a more robust editing on that point!

tbuckl commented 9 years ago

Hi all, I love the spirit of this project. I agree with @joyrexus that an example of versioning the row and column changes of simple, small CSV would be required to convince me that @dfockler's comment that Dat has the focus or capacity to be "essentially git for...datasets."

Maybe I am just missing something? I realize this is a nascent enormous project, and like I said: love the spirit of it, but it would take less, in a way, for me to start playing with it, if that makes sense.

okdistribute commented 8 years ago

After a whole year of feedback from the community, we recently published a new version of dat. Would you mind trying out the new dat with npm install -g dat? You can read about the new dat announcement on the website and how it works in the docs.

dat-ecosystem / dat

What the hell is dat? #305