TritonDataCenter / node-manta

Node.js SDK for Manta
75 stars 54 forks source link

`mmpu commit` could make zero-byte commits more obvious to users #338

Open jordanhendricks opened 6 years ago

jordanhendricks commented 6 years ago

A frequent mistake new users of mmpu make is not specifying part etags when committing the object:

# create a multipart upload and upload some parts
$ mmpu create ~~/stor/my-obj.txt
bde8ee50-dcdb-6d09-a2e8-afcfcee440d3
$ mmpu upload bde8ee50-dcdb-6d09-a2e8-afcfcee440d3 0 -f part1.txt
5354ee1e-0f37-4840-dc60-bfabc5e1f8f9
$ mmpu upload bde8ee50-dcdb-6d09-a2e8-afcfcee440d3 1 -f part2.txt
d44c62a7-256a-c351-957f-d5741b25e80c

# try to commit the object (but this commits an empty object)
$ mmpu commit bde8ee50-dcdb-6d09-a2e8-afcfcee440d3

# "Where's my object?"
$ mget ~~/stor/my-obj.txt
$

The MPU API requires part etags because parts are normal Manta objects and can thus be overwritten. Specifying the etag ensures the client is committing the correct object. To mirror the API, the mmpu CLI requires the etags the user wants to commit.

Using the CLI is well-documented (in both the mmpu commit help message and the mmpu man page). Still, given that so many users expect mmpu commit to commit parts they've uploaded without the additional etag specification step, it would be good to improve the usability of this command.

jordanhendricks commented 6 years ago

There are a couple ways I have thought about doing this:

  1. When a user tries to commit an empty object, prompt them to ensure they actually meant to do that. Something like the following:
# committing zero-byte object intentionally
$ mmpu commit bde8ee50-dcdb-6d09-a2e8-afcfcee440d3
mmpu: commit empty object? y

# committing zero-byte object unintentionally
$ mmpu commit bde8ee50-dcdb-6d09-a2e8-afcfcee440d3
mmpu: commit empty object? n

The upside of this approach is that it would likely prevent all users from ever making the mistake above. The downside is that we would be converting the CLI from non-interactive to interactive, which I am not really a fan of (and may require a version bump).

  1. Add a flag to mmpu commit that indicates you want the CLI to fetch the etags of an MPU's parts as they currently exist in Manta so that you don't have to specify them yourself.

Something like:

$ mmpu commit -e bde8ee50-dcdb-6d09-a2e8-afcfcee440d3

This approach adds some latency to the CLI invocation, as it will need to make additional requests to fetch the upload directory entries. (With an API max of 10000 parts per multipart upload, and 1000 results max per directory listing, this means at worst 10 additional requests for a given commit.)

It does not necessarily prevent first-time users from accidentally committing an empty object, but a flag would appear in the flag list in the help message, which would hopefully mean it's more likely to be noticed by a user running mmpu commit -h. Additionally, it would not change the CLI from non-interactive to interactive, as option 1 does.

With those tradeoffs in mind, I'm inclined to choose option 2. I am open to other ideas as well!

davepacheco commented 6 years ago

Is it just as important for the CLI that the user be very explicit about which parts they want? I wonder if we could make your suggested flag the default behavior and only require the full list of parts if the list that's on the server is ambiguous in some way. We could still allow people to specify specific etags.

jordanhendricks commented 6 years ago

@davepacheco: I don't think it's as important in most cases for the CLI. Having the flag on by default would be a breaking change, though.

I would also say I think it makes the most sense to have the flag on by default in the case that no etags are given (mmpu commit <id>), and if etags are specified, (mmpu commit <id> <etag0> <etag1> ...), then to assume those are the etags the user wants to use. This would prevent breakage for consumers doing non-zero-byte commits.

davepacheco commented 6 years ago

Yes, I agree that if the user specified etags, we should just use those. Given that, I think the only cases where this would come up are cases where the command would have failed anyway, so I don't think any correct usage would be affected. Right?

jordanhendricks commented 6 years ago

After some discussion offline, we came up with a scheme we think makes sense:

  1. By default, invoking mmpu commit <id> will look up the etags of all parts uploaded for the MPU and use those in the commit request. This behavior makes sense as the default, as it's what users seem to expect, and it has the added benefit of preventing user error when keeping track of etags of parts manually. It will also reflect the current behavior of creating a zero-byte object for MPUs that have no parts uploaded, as mmpu will see that no parts have been uploaded, and commit no parts accordingly.
  2. If the user specifies etags (e.g., mmpu commit <id> <etag0> <etag1>), then mmpu will not look up the etags for them and simply submit the ones provided. This is consistent with the current behavior of mmpu, allowing scripts that commit non-empty subsets of parts uploaded to an MPU to work.
  3. If for some reason, the user uploaded parts, but wants to commit an empty object, they will need to override the default behavior with an additional flag. E.g.,: mmpu commit --commit-empty. This will break users that expect invoking mmpu commit <id> to commit a zero-byte object even if parts are uploaded to create an empty object.

This scheme requires a major version bump because of item (3): Users expecting mmpu commit <id> to commit an empty object for an MPU that has parts uploaded to it will be broken by this change.

Committing a zero-byte object after uploading parts to the MPU is a rare case as far as I know. Additionally, this scheme seems like it will improve the usability for the vast majority of new users without breaking the most common uses of mmpu --- committing parts uploaded to an MPU by specifying their etags manually. As such, I think it's a good approach to solving this.