anowell / mia

Experimental Algorithmia CLI (no longer the official CLI)
41 stars 5 forks source link

Early fail if collection path is invalid #8

Open zeryx opened 8 years ago

zeryx commented 8 years ago

When uploading a file to a collection using algo cp, it doesn't check to make sure the path exists first, it would be incredibly nice if it did.

This is a problem when uploading large files; as in that kind of situation you'd normally prefer the upload attempt to fail early if there are any simple bugs like path name invalidity.

anowell commented 8 years ago

I like the idea. Checking that .parent() is a directory is sufficient and straight-forward, but there are a few caveats:`

Alternatively, we could try and create a zero-byte file before uploading. If the zero-byte file succeeds, then upload the larger file. I like the simplicity, but there is a question of correctness to this approach: What if creating the 0-byte file succeeds, but larger copy fails, now you have an empty file you didn't expect. We could try to delete it, but that could also fail. Also, what if some connector API doesn't allow 0-byte files?

Argoday commented 8 years ago

You could also extend HEAD with an additional response header specifying the ACL on the file path (irrespective of if the file exists) that way you can validate both path sanity and write permissions

anowell commented 8 years ago

That would address all the path and write permissions that WE manage. It won't help the case of IAM role restricting write access for s3:// data, but that may be a reasonable trade-off.

I also would really love HEAD to return ACLs (specifically: is a given directory listable?) for this AlgoFS issue: https://github.com/anowell/algorithmia-fuse/issues/1

pmcq commented 8 years ago

FWIW I think this is an issue with all of our clients - not sure if it's something that should go into spec?

anowell commented 8 years ago

Originally, I thought this really only applied to the CLI as a convenience for algo cp under the premise that users of the language clients could easily check before writing to a file. But given the complexity of checking, I'm increasingly willing to agree that clients could use some help making this determination before uploading large files. Some ideas:

  1. file.isWriteable()method. (client users could explicitly make this check if they want it)
  2. Have put and putFile auto-magically check and fail early (1 extra HEAD request for every write)
  3. Have put and putFile check and fail early, but only for large files

Opinions? Any of these would merit a change to the client spec. (I lean toward isWriteable because I tend to prefer explicit over magic that has overhead, and because I think we'll want to expose ACL data anyway over time)

Argoday commented 8 years ago

Both 1 and 3 :+1: (and a file.getAcl() / file.isReadable()) and for the connector hidden ACL issue you 'could' enable users to add that info to Algorithmia's DB as informational so those calls also have full ACL information ... but that's more a longer term thing to prioritize