Closed balupton closed 8 years ago
lol
As of right now, it seems like an easily sharable, syncing file format, essentially git for large datasets. It's kind of weirdly structured because it's meant to hook into other scripts and applications to actually do data transformations. It also seems like they made a move to work on sharing scientific data, which was not the first intended purpose. But the docs don't really explain a clear use case, and it's still in alpha so that could have something to do with it.
@balupton http://maxogden.github.io/get-dat/
Haha! You made me lol.
There are 3 of us working on it now, and we're in the process of finishing beta. We'll update the website with some tutorials, getting started guides, and stuff like that. Thanks @dfockler @nichoth and @gobengo :)
Thanks everyone, the talk became the best resource for me as it goes into the use case, what it does, when to use it, why it is important, etc etc. The walkthrough isn't that helpful initially, as it is how to do it, rather than the why, whats, or what ifs which is needed prior to making the investment to the hows.
That being said, now that I know what it is, it seems really nifty. Keen to follow this project. Looking forward to the new website and marketing.
I can't quite figure out what the hell it is either... but it's really cool and has amazing amounts of potential.
It seems like a lot of the promised features are not implemented yet. See #296 and #300 for recent examples.
I found the 'get-dat' walkthrough very useful (and very technically impressive...).
I realized a cool use case, which is as a 'sink' for any script you would run that spits newline-delimitted JSON to stdout. For example, JSON sourced from an API.
Last week I made something like that for work, livefyre-geo-collection. In the near future I'll take a stab at piping it into dat
as a way to persist the results of the 'archive' command.
@balupton @nylen for some background, I just wrote this whitepaper draft: https://github.com/maxogden/dat/blob/master/docs/whitepaper.md
I'd be happy to receive questions/comments/PRs/suggestions/etc if you have the time! :)
@karissa cool that was useful.
So... what do you do with the data once it is in dat?
It seems like the following:
Item 3 here, seems more like it should be:
Or something along the lines of being able to work with the current latest/final data in our ideal structures for querying/rendering/etc
My actual use case here is to develop a static site generator, which database is a local leveldb or mongodb or pouchdb database, which uses dat to import data into that local database from several external sources (prismic.io, wordpress.com, ghost.org, tumblr.com, soundcloud.com, etc) - allowing for the ability to consolidate a person's online data and render it nicely for an always up to date personal portfolio website. Idea brief.
@karissa thanks for the whitepaper. I found it helpful.
I'd like to offer some general feedback: You close the first section with ...
We introduce Dat, a version-controlled distributed database and data tool that has the user interface of a version control system (VCS).
... and follow with a nice summary of key features. You then compare/contrast w/ Kafka. As a result, the reader has an initial mental model of a distributed database and/or messaging system, both of which may be somewhat misleading.
FWIW, I'd suggest that a somewhat better mental model to put right up front might be a CSV file / spreadsheet table ... with row-level versioning. Whenever I try to explain dat I always start with that, because everyone's familiar with a table of data in Excel and can grok the idea of the table's row-by-row change history. Once that sinks in, I mention that the various versions of the table are "clone-able": so, table of data ... that's versioned ... and replicable.
Anyway ... you do a nice job of explaining dat, but I think putting the idea of a tabular-sheet-of-data-with-change-history front-and-center might be useful to newcomers looking for a simple, concrete mental model. Everything else (blob storage, the REST interface, etc.) can hang off that.
@joyrexus awesome, thanks for your feedback. I made some quick edits but I'd like to go through at some point to do a more robust editing on that point!
Hi all, I love the spirit of this project. I agree with @joyrexus that an example of versioning the row and column changes of simple, small CSV would be required to convince me that @dfockler's comment that Dat has the focus or capacity to be "essentially git for...datasets."
Maybe I am just missing something? I realize this is a nascent enormous project, and like I said: love the spirit of it, but it would take less, in a way, for me to start playing with it, if that makes sense.
After a whole year of feedback from the community, we recently published a new version of dat. Would you mind trying out the new dat with npm install -g dat
? You can read about the new dat announcement on the website and how it works in the docs.
I've been reading the various links and I can't find anything on what the hell is dat? Only things of some high level features mentioned on the dat homepage and then instructions on how to use it.
Is there a talk or something on it?