Closed isaacs closed 9 years ago
Perfect, I like this a lot.
I was talking in #manta earlier today and it got me thinking about how to implement manta => local sync.
I think we're on the same page; I had the thought that the file listing as well as size and md5 calculations should be abstracted from this module. Basically, node-manta-sync
would call some core lib to "list files", whether those files are on manta, on the local filesystem, in couchdb, on mars, it doesn't matter. The library would return an array of "file objects", which would contain the name, size, md5, and anything else that is necessary.
The next thing would be to abstract the PUT
step. So a local filesystem will use fs.createWriteStream
, manta will use client.put
, etc. That way, this program can effectively do manta => manta, manta => local, local => manta, and hell even local => local (like rsync
).
If the core module that handles this abstraction is done properly, it shouldn't be hard to add the ability to wrap couchdb using this API, or any other data store.
Let me know if this is what you've explained above, or if I have missed the point completely.
Yeah, that's kind of where I'm headed with this. Couchdb attachments are already an object of the format:
_attachments: {
"some-file.txt": {
"digest":"md5-deadbeef000000",
"length":1234,
"content_type":"text/plain",
},
...
}
and some extra couchdb-specific stuff.
I'm writing my generic sync module to be something like this:
sync({
path: "/isaacs/stor/path/on/manta/to/stuff",
client: myMantaClient,
request: function(filename, cb) {
// load up the stream, call cb(er, stream)
},
files: {
"filename.txt": {
// filename, size, md5 digest, etc.
// all fields are optional, will use md5 if provided, then size,
// then assume that all files must be written
// if filename has slashes, then dirs will be made as needed
},
...
},
"delete": true // default = no deletes
}, cb)
So, it's exposing a lot of detail, but should be easy to generate that object by reading the filesystem for you, or fetching from couchdb for me, or even reading from some other manta store (though, in that case, mlink is probably much better.)
Ah, ok. I see now, and this API makes it very simple. This way, the portion of manta-sync
that syncs the files to manta can just be replaced with this sync
module.
I like the visibility via logging that manta-sync
supplies, it'd be great if the sync
module could also be an event emitter, or take a callback to fire, to let the user know a file was successfully put or failed to put.
ps. @sentientwaffle and I were just talking about npm
and how far couchdb could go. One of us said that putting the data manta
would be pretty cool... it sounds like that's what's happening here :)
Yeah, returning an event emitter would be great! I mostly just care about the final completion success, so that's all I'd considered, but yeah, on the command line that's huge. What event name would you like it to emit for each file?
For your consideration: https://github.com/isaacs/cuttlefish http://npm.im/cuttlefish
I'll add concurrency options, document the event names more properly, and then send you a pull req to close this tomorrow.
closing.. these projects have diverged
I'm building a thing to efficiently port the npm tarball attachments into Manta. It'd be nice if I could reuse some of the code here, so I'm thinking about breaking out the "fs" bits from the "sync based on size or md5 and optionally delete the remote" bits.
Do you think that this is a good idea? I figure that node-manta-sync and manta-couch-sync could then both be relatively thin wrappers around this core lib.
I'm mostly done with the couchdb attachment sync, so if you'd like, I can try to abstract out that bit, and send you a pull req to use it.