TritonDataCenter / node-manta-sync

Rsync style command for Joyent's Manta
31 stars 9 forks source link

possible to split up for use with couchdb attachments? #4

Closed isaacs closed 9 years ago

isaacs commented 10 years ago

I'm building a thing to efficiently port the npm tarball attachments into Manta. It'd be nice if I could reuse some of the code here, so I'm thinking about breaking out the "fs" bits from the "sync based on size or md5 and optionally delete the remote" bits.

Do you think that this is a good idea? I figure that node-manta-sync and manta-couch-sync could then both be relatively thin wrappers around this core lib.

I'm mostly done with the couchdb attachment sync, so if you'd like, I can try to abstract out that bit, and send you a pull req to use it.

bahamas10 commented 10 years ago

Perfect, I like this a lot.

I was talking in #manta earlier today and it got me thinking about how to implement manta => local sync.

I think we're on the same page; I had the thought that the file listing as well as size and md5 calculations should be abstracted from this module. Basically, node-manta-sync would call some core lib to "list files", whether those files are on manta, on the local filesystem, in couchdb, on mars, it doesn't matter. The library would return an array of "file objects", which would contain the name, size, md5, and anything else that is necessary.

The next thing would be to abstract the PUT step. So a local filesystem will use fs.createWriteStream, manta will use client.put, etc. That way, this program can effectively do manta => manta, manta => local, local => manta, and hell even local => local (like rsync).

If the core module that handles this abstraction is done properly, it shouldn't be hard to add the ability to wrap couchdb using this API, or any other data store.

Let me know if this is what you've explained above, or if I have missed the point completely.

isaacs commented 10 years ago

Yeah, that's kind of where I'm headed with this. Couchdb attachments are already an object of the format:

_attachments: {
  "some-file.txt": {
    "digest":"md5-deadbeef000000",
    "length":1234,
    "content_type":"text/plain",
  },
  ...
}

and some extra couchdb-specific stuff.

I'm writing my generic sync module to be something like this:

sync({
  path: "/isaacs/stor/path/on/manta/to/stuff",
  client: myMantaClient,
  request: function(filename, cb) {
    // load up the stream, call cb(er, stream)
  },
  files: {
    "filename.txt": {
      // filename, size, md5 digest, etc.
      // all fields are optional, will use md5 if provided, then size,
      // then assume that all files must be written
      // if filename has slashes, then dirs will be made as needed
    },
    ...
  },
  "delete": true // default = no deletes
}, cb)

So, it's exposing a lot of detail, but should be easy to generate that object by reading the filesystem for you, or fetching from couchdb for me, or even reading from some other manta store (though, in that case, mlink is probably much better.)

bahamas10 commented 10 years ago

Ah, ok. I see now, and this API makes it very simple. This way, the portion of manta-sync that syncs the files to manta can just be replaced with this sync module.

I like the visibility via logging that manta-sync supplies, it'd be great if the sync module could also be an event emitter, or take a callback to fire, to let the user know a file was successfully put or failed to put.

bahamas10 commented 10 years ago

ps. @sentientwaffle and I were just talking about npm and how far couchdb could go. One of us said that putting the data manta would be pretty cool... it sounds like that's what's happening here :)

isaacs commented 10 years ago

Yeah, returning an event emitter would be great! I mostly just care about the final completion success, so that's all I'd considered, but yeah, on the command line that's huge. What event name would you like it to emit for each file?

isaacs commented 10 years ago

For your consideration: https://github.com/isaacs/cuttlefish http://npm.im/cuttlefish

isaacs commented 10 years ago

I'll add concurrency options, document the event names more properly, and then send you a pull req to close this tomorrow.

bahamas10 commented 9 years ago

closing.. these projects have diverged