Progress monitor during uploads
checkpresent has meaningful exit status
cache sha256 hashes and use them with checkpresent and sync
have archive list support --verbose option
allow archive upload to pass multiple files at once

Thank you for your work! I appreciate the quality of your pull request. I'm unavailable for the next three weeks or so, but I wanted to give you some quick feedback.

All of this looks good for merging in principle. I'd love to have these new features you've written, in particular the tree hash verification.

I'm concerned, though, about changing the behaviour of checkpresent. It exists solely for git-annex, and I'm concerned that changing the return value semantics could break git-annex in some way. Are you just looking for a mechanism to verify a file, or did you specifically want to change the checkpresent command? How about a new subcommand instead, such as just archive check?

The rest of the tree hash code looks great.

The verbose archive list option is also fine, but currently depends on the tree hash code, which depends on the checkpresent change.

The upload progress code looks fine, except for a couple of minor points. wget prints its progress to stderr, so I'm not sure if we should print this to stdout or stderr. I'd like to maintain parity with other similar commands to minimise user confusion. Right now I think I prefer stderr. Secondly, can we have some sensible behaviour if the input is a pipe? Perhaps just disable the progress indicator in that case?

The multiple upload files patch looks generally fine. A couple of questions:

archive_upload_subparser.add_argument('files', nargs="+")

Why not use:

archive_upload_subparser.add_argument('file', type=argparse.FileType('rb'), nargs='+')

We could still check for uniqueness against each file's name attribute, since that's the only type of uniqueness we care about. Perhaps error out if any of the names aren't unique, since then we can't have uniquely named uploads? Or, since Glacier permits duplicate names, perhaps we should too? Then we can maintain whatever standard errors argparse emits.

raise RuntimeError("Failed to open file '%s'" % file_name)

If we're catching all open failures and then just passing this through, then I think it's better to just pass the original exception through (ie. not catch it at all). Then the user will get a better indication of why it didn't work (eg. permission denied vs. file does not exist), rather than a mysterious "Failed" message. And we can get that functionality for free!

Sorry for the lengthy review. I feel that it's proportional to the size of your pull request :)

I'm happy to merge the smaller pieces that look fine right now, but they currently conflict with the bigger questions because of the ordering of the commits, and I don't have time to fix that up right now. I'll see what I can do when I'm back next month unless you get to it first. Can I suggest breaking this up into a larger number of smaller pull requests so that we can merge what we can and work on the rest?

Again, thanks for your work!

So somehow your project isn't showing up on GitHub right now (404 error)...which is strange. Hopefully it's just a temporary github problem. Anyways...

Thanks for the comments, I will definitely implement what you have suggested.

As for the behavior change of checkpresent, the intention was to allow someone to script glacier-cli (myself included). For example:

glacier-cli archive checkpresent

echo $?

This seemed more inline with the behavior of other core utilities. Since I don't grok Haskell it's hard for me to determine if the change in return status will break git-annex or not. I'll implement an alternate command check as you suggested. This will actually have the secondary benefit that check can expect to be passed a file and then someone could do something like this (kinda like md5sum -c)

glacier-cli archive check

file1: OK

file2: FAILED

file3: NOT PRESENT

file4: OK

glacier-cli: WARNING: 1 computed checksum did NOT match glacier-cli: WARNING: 1 file was NOT present

You are correct that wget sends progress to stderr; on the other hand, scpuses stdout. I don't have a preference, but I would tend to consider stdout more appropriate because when used within a cron job it's common to redirect > /dev/null and if something errant happens cron will capture the output and deal with it appropriately. If the progress goes to stderr then the user either gets tons of cron emails or they have to choose to ignore error message.

I considered using:

archive_upload_subparser.add_argument('file', type=argparse.FileType('rb'), nargs='+')

But shyed away from it because I was concerned that if a large number of files were passed on the command line, all of them would be opened simultaneously and be kept open for a potentially long duration while each file goes through it's upload progress.

I'll go back and make the commits a bit smaller and send you another pull request.

Thanks, ~Michael

On Tue, Oct 15, 2013 at 9:06 AM, basak notifications@github.com wrote:

Thank you for your work! I appreciate the quality of your pull request. I'm unavailable for the next three weeks or so, but I wanted to give you some quick feedback.

All of this looks good for merging in principle. I'd love to have these new features you've written, in particular the tree hash verification.

I'm concerned, though, about changing the behaviour of checkpresent. It exists solely for git-annex, and I'm concerned that changing the return value semantics could break git-annex in some way. Are you just looking for a mechanism to verify a file, or did you specifically want to change the checkpresent command? How about a new subcommand instead, such as just archive check?

The rest of the tree hash code looks great.

The verbose archive list option is also fine, but currently depends on the tree hash code, which depends on the checkpresent change.

The upload progress code looks fine, except for a couple of minor points. wget prints its progress to stderr, so I'm not sure if we should print this to stdout or stderr. I'd like to maintain parity with other similar commands to minimise user confusion. Right now I think I prefer stderr. Secondly, can we have some sensible behaviour if the input is a pipe? Perhaps just disable the progress indicator in that case?

The multiple upload files patch looks generally fine. A couple of questions:

archive_upload_subparser.add_argument('files', nargs="+")

Why not use:

archive_upload_subparser.add_argument('file', type=argparse.FileType('rb'), nargs='+')

We could still check for uniqueness against each file's name attribute, since that's the only type of uniqueness we care about. Perhaps error out if any of the names aren't unique, since then we can't have uniquely named uploads? Or, since Glacier permits duplicate names, perhaps we should too? Then we can maintain whatever standard errors argparse emits.

raise RuntimeError("Failed to open file '%s'" % file_name)

If we're catching all open failures and then just passing this through, then I think it's better to just pass the original exception through (ie. not catch it at all). Then the user will get a better indication of why it didn't work (eg. permission denied vs. file does not exist), rather than a mysterious "Failed" message. And we can get that functionality for free!

Sorry for the lengthy review. I feel that it's proportional to the size of your pull request :)

I'm happy to merge the smaller pieces that look fine right now, but they currently conflict with the bigger questions because of the ordering of the commits, and I don't have time to fix that up right now. I'll see what I can do when I'm back next month unless you get to it first. Can I suggest breaking this up into a larger number of smaller pull requests so that we can merge what we can and work on the rest?

Again, thanks for your work!

— Reply to this email directly or view it on GitHubhttps://github.com/basak/glacier-cli/pull/31#issuecomment-26315998 .

404 error

Apparently Github think I'm not human. I've contacted them and hope they'll restore it soon.

I like the output you suggest for archive check. Sounds good to me!

I didn't know scp goes to stdout. I'm not sure what the right answer there is now. I'll accept either for now I think (easy to change later anyway).

But shyed away from it because I was concerned that if a large number of
files were passed on the command line, all of them would be opened
simultaneously and be kept open for a potentially long duration while each
file goes through it's upload progress.

I think this is fine. There's no real cost to doing this, as an open file doesn't really tie anything up. If a user is supplying hundreds of files, he should break it down using xargs anyway - presumably before we reach the file descriptor limit. I think I'd prefer to stick to the argparse standard here, and change it only if we can identify a real issue with it.

I'll go back and make the commits a bit smaller and send you another pull
request.

It's not so much smaller commits. Your commits are absolutely fine. It's that you sent one big pull request in one go, instead of multiple smaller pull requests over time. That way we can build up some momentum since I hate having to hold pull requests. I'd prefer to get what I can merged, rather than having to hold the whole pull request back just because of a couple of minor issues.

Again, I really appreciate your contribution and look forward to merging this work. Thank you!

Sorry, I just realised that you've updated the pull request already! I will review as soon as I can.

Regarding:

I think I'd prefer to stick to the argparse standard here, and change it only if we can identify a real issue with it.

I found this interesting discussion when researching the proper coding pattern to use. The original author of argparse, Steven Bethard, states:

So I generally agree that FileType is not what you want for anything but quick scripts that can afford to either leave a file open or to close stdin and/or stdout.

Obviously, my inclination is to handle the file opening/closing explicitly...but this is your project so I'll defer to your judgement.

It's that you sent one big pull request in one go, instead of multiple smaller pull requests over time.

Gotcha, I'll subdivide the pull request into a few smaller ones. Feel free to close this one without merging.

Thanks!

basak / glacier-cli

Various improvements to glacier-cli #31

glacier-cli archive checkpresent

echo $?

glacier-cli archive check