RhodiumGroup / rhg_compute_tools

Tools for using compute.rhg.com and compute.impactlab.org
MIT License
1 stars 4 forks source link

Add CLI for `replicate_directory_structure_on_gcs()` #48

Closed brews closed 4 years ago

brews commented 4 years ago

This adds basic structure for CLI tools. I used a CLI version of replicate_directory_structure_on_gcs() as a starting point.

Here is how it works. Assuming the user installed via pip or conda or whatever. Open command line and run

rctools gcs repdirstruc dir1 gs://my-gcs-bucket

Which copies any nested directory structure in "dir1" into "gs://my-gcs-bucket". You can see help with

rctools gcs repdirstruc --help

and this works for all subcommands, so

rctools --help

will list and briefly describe all of the subcommands available.

Note that I had to extend replicate_directory_structure_on_gcs() and change its signature so that it now takes an optional authorized client or a path to GCS credentials for the client_or_creds arg.

This PR also has a few other minor cleanups to HISTORY and package metadata. It adds click and pytest-mock as dependencies.

brews commented 4 years ago

I marked this as WIP partially because I'm not sure if we (@delgadom) need other rctools gcs *** commands in this PR or if need larger refactoring, docs, names changes, etc.

Now is a good time to look for big changes in the CLI. :-)

delgadom commented 4 years ago

This is awesome! Thank you @brews

RE: your question, I think this structure is very forward-compatible with other changes we might make. So I don't necessarily think any other changes necessarily have to go in this PR. That said, I think a handful of additional features would be nice:

Additionally, we should test the current directory structure replication command in sync_gcs... I imagine the proposed implementation in replicate_directory_structure_on_gcs is faster but I'm not sure. If the sync_gcs implementation is faster, we could provide a flag or write logic to determine whether the directory structure should be created using os or the google cloud storage API.

Finally, (and this should probably just be a different issue), a directory mv command would be extremely helpful, and would involve a lot of these elements. Renaming directories on gcs is super obnoxious currently, but most of the implementation is already in this package - it's just a directory walk, renaming every blob.

delgadom commented 4 years ago

And to clarify, I'm totally on board with leaving the above feature requests as issues and getting to them down the line. Not necessary to tackle them right now.