BD2KGenomics / s3am

A fast, parallel, streaming multipart uploader for S3
Other
13 stars 6 forks source link

Implement `s3am get-sse-keys URL /PATH/TO/OUTPUT` #32

Open hannes-ucsc opened 8 years ago

hannes-ucsc commented 8 years ago

URL should be enforced to be s3://BUCKET/ or s3://BUCKET/KEY where KEY is to be interpreted as a key prefix if --prefix is given (to be consistent with s3am upload). All matching objects on S3 should be listed and the master key should be used to derive a per-file key. The resulting per-file key and the object's URL should be written to the output file, one line per object in the form s3://BUCKET/FULL/KEY\tSSE_KEY_BASE64 where SSE_KEY_BASE64 is the base64 encoding of the binary SSE key and \t is the tab character.

The master key should be specified using --sse-key and friends but those options should be mandatory for get-sse-keys, despite the oxymoronic nature of that requirement.

arkal commented 8 years ago

I'll get to this tomorrow!

arkal commented 8 years ago

I suggest there be 4 options for URL

  1. S3://BUCKET/ - All keys in the bucket
  2. S3://BUCKET/KEY/ - All keys in the bucket in folder KEY (must point to a folder)
  3. S3://BUCKET/KEY - single KEY in the bucket (must point to a file)
  4. S3://BUCKET/KEY + --prefix - All keys in the bucket in with prefix KEY

There is a small chance that the user may have something like

UCSC/
UCSC/file_1
UCSC/file_2
UCSC_manifest

And if we use your suggestion, saying S3://BUCKET/UCSC --prefix gives you all 4. Using this suggestion, you can get all files in UCSC with S3://BUCKET/UCSC/

Might be worth considering?