anowell / mia

Experimental Algorithmia CLI (no longer the official CLI)
41 stars 5 forks source link

Support downloading/uploading entire data directory #2

Open anowell opened 9 years ago

anowell commented 9 years ago

Would like to download an entire data directory via: algo download <data-directory-uri> [local-directory]

and be able to do so with concurrency similar to how algo upload works.

(Waiting for #1 to replace how concurrency works)

Argoday commented 9 years ago

Might just use 'sync' instead of upload / download On Aug 28, 2015 7:56 PM, "Anthony Nowell" notifications@github.com wrote:

Would like to download an entire data directory via: algo download [local-directory]

and be able to do so with concurrency similar to how algo upload works.

(Waiting for #1 https://github.com/algorithmiaio/algorithmia-cli/issues/1 to replace how concurrency works)

— Reply to this email directly or view it on GitHub https://github.com/algorithmiaio/algorithmia-cli/issues/2.

anowell commented 9 years ago

I'd actually like to use 'cp'... I think it's the same as your suggested 'sync', but semantically more like scp.

algo cp [-r] <source>... <dest> (bonus points for aliasing to acp)

However, if you allow either source or dest to arbitrarily be a data URI, then you need the data:// protocol prefix to resolve ambiguity (.my/foo could be a local directory). I don't like the ergonomics of forcing data:// on the command-line, so I favor deducing user intent in the most common, least ambiguous cases. I propose these ambiguity resolution rules (that only apply if none of the args use a "data://" prefix):

  1. If a single source is specified
    • If source resolves locally, but dest does not and dirname(dest) is not a local directory: upload
    • If dest resolves locally, but source does not: download
  2. If multple sources are specified
    • If all sources resolve locally and dest does not: upload
    • If dest resolves locally, but any sources do not: download
  3. All other cases are ambiguous and result in a warning to explicitly use data:// prefix

Notes:

Argoday commented 9 years ago

'cp' ~= 'sync' , either term is good

I don't like going down the route of solving the ambiguity problem here ... just have one side or both have the data:// prefix ... it is: 1) simple 2) not surprising 3) doesn't depend on local state to know what it does

Argoday commented 9 years ago

Note: scp is fully deterministic and does not rely on local state instead choosing to use remote decorators

anowell commented 9 years ago

4) it's also the only command in the entire algo utility that would require the data:// prefix.

I also don't like the dependence on local state. This is why I've punted so far on implementing 'cp' and stuck with separate 'download' and 'upload' commands. But I also prefer that a CLI tool assumes intuitive/expected behavior in the sloppy cases (as long as it provides a way to be explicit, e.g. curl guesses protocol if not specified, scp assumes username based on local state if not specified).

I imagine this as a first experience with the Data API from the CLI:

$ algo ls
anowell
$ algo ls anowell
$ algo mkdir anowell/foo
Created directory data://anowell/foo
$ algo ls anowell
foo
$ algo cp myfile.txt anowell/foo
Warn: potentially ambigous paths - prefix remote paths with data:// to avoid this warning.
Uploaded data://anowell/foo/myfile.txt

I just find myself thinking "why make that an error, and force them to re-type it when we can confidently know what they intended"

anowell commented 9 years ago

of course, for the sake of arguing with myself:

5) adding ambiguity resolution is backward compatible. removing it is not.