Open danbills opened 6 years ago
re 5: Write access:
we have to stage stuff in CWL too - maybe even more so than we do in WDL
Regardless, my comment would be that we don't necessarily need to write to the same FS that inputs are coming from - eg if we're running on PAPI would could "write" to gs://...
even if most inputs are coming from https://...
re 6: hashes beside CRC32
- yes we can use anything. Only if we want to call cache between tasks from different FS's do we need to standardize.
That's not been a problem for now between local (md5
) and GCS (CRC32C
) because we'd never call cache between local and PAPI anyway
3: call caching: in order to make the cache hit, we only need to obtain the MD5 from the input and match it to something run before. If we can get this from the supplied psURL then we have the md5 and can match internally.
As long as we are not using psURLs as destinations (e.g. we are still writing task outputs to the cromwell execution bucket) performing the "hit" (e.g. doing the copy/reference) shouldn't be affected by psURLs.
1: on google, generating a psURL and calling HEAD on it (which you can also do with a GET and only as for the 1st byte)
HTTP/2 200 x-guploader-uploadid: AEnB2Uo10d8ECr7tR5601R8roi8MIXlzvg1rjyMui9wavFC7KO2Pv2QBk94Qv22mgAz5Ih0nnayc2kXj5XBFgRUqkNTJNtAo7Q expires: Fri, 29 Jun 2018 15:56:42 GMT date: Fri, 29 Jun 2018 15:56:42 GMT cache-control: private, max-age=0 last-modified: Fri, 29 Jun 2018 15:53:49 GMT etag: "09f7e02f1290be211da707a266f153b3" x-goog-generation: 1530287629024005 x-goog-metageneration: 1 x-goog-stored-content-encoding: identity x-goog-stored-content-length: 6 content-type: text/plain content-language: en x-goog-hash: crc32c=sMnOMw== x-goog-hash: md5=CffgLxKQviEdpweiZvFTsw== x-goog-storage-class: STANDARD accept-ranges: bytes content-length: 6 server: UploadServer alt-svc: quic=":443"; ma=2592000; v="43,42,41,39,35"
Introduction
The essence of a presigned URL is that it gives you privileged access to data (via HTTP verbs, usually
GET
) for a finite amount of time. Some metadata can be obtained via theHEAD
verb.DOS URI's can be resolved to presigned URLs, and it's not immediately obvious how to provide the info Cromwell needs to do its job. Hence this document.
The essence of this question is how do we leverage HTTP.
Information Needed for Cromwell to work
read_lines
)Information Provided by OpenDJ / Martha as of 6/25/18
Information provided by HTTP (in theory)
HEAD
RANGE
header on GETInformation Not Provided by OpenDJ/Martha as of 6/25/18
Outstanding questions (please comment if you have info)
HEAD
?HEAD
metadata a standard, and do all clouds implement that standard? (I think ETag is common name for this info.)Range
header. Do clouds support this feature? Are there other ways of achieving this requirement?write_lines
, which AFAIK is only possible viaPATCH