HumanCellAtlas / data-store

Design specs and prototypes for the HCA Data Storage System (DSS, "blue box")
https://dss.staging.data.humancellatlas.org/
Other
40 stars 6 forks source link

PUT file should accept date in more RFC3339 formats #1848

Open kislyuk opened 5 years ago

kislyuk commented 5 years ago

For the sake of usability, this should work:

hca dss put-file --uuid $(uuid) --version "$(date --rfc-3339=ns)" --creator-uid 1 --source-url s3://...
kislyuk commented 5 years ago

For reference, here is the mangling I had to do to get this through (and lose the microsecond precision for collision resistance in the process):

hca dss put-file --uuid $(uuid) --version "$(date --rfc-3339=seconds|sed -e 's/ /T/' -e 's/://g'|cut -d + -f 1).000000Z" --creator-uid 1 --source-url s3://...
kozbo commented 4 years ago

@kislyuk hca dss create_version can get a time format that is valid for the DSS. Do we need to further modify the API to handle the RFC3339 version format?

kislyuk commented 4 years ago

@kozbo I filed this issue out of concern about the usability and interoperability of DSS with standard Linux tools - specifically, date, which should intuitively be usable for generating a version for uploading to DSS (for example, in shell scripts), but ends up not being usable without heavy editing. Ideally, the DCP CLI should not strictly be required for communicating with the DSS REST API - one should be able to script the interaction with a standard HTTP client like curl or httpie:

http PUT https://dss.data.humancellatlas.org/v1/files/$(uuidgen) Authorization:"Bearer $DCP_TOKEN" source_url=s3://public-test-bucket-idseq/blob_water_3_S23_R2_001.fastq.gz replica==aws creator_uid:=1 version=="$(date --rfc-3339=ns)"

The RFC3339 option is relevant here because that's the most widely accepted standard for human-readable formatting of date/timestamps, and is referenced by the DSS API definition.

Instead I have to do this to get my shell script to work:

http PUT https://dss.data.humancellatlas.org/v1/files/$(uuidgen) Authorization:"Bearer $DCP_TOKEN" source_url=s3://public-test-bucket-idseq/blob_water_3_S23_R2_001.fastq.gz replica==aws creator_uid:=1 version=="$(date --utc --rfc-3339=seconds|sed -e 's/ /T/' -e 's/://g'|cut -d + -f 1).000000Z"

Which is a lot more cumbersome and harder to infer for someone trying to get started interoperating with the DSS API.