ga4gh / data-repository-service-schemas

A repository for the schemas used for the Data Repository Service.
Apache License 2.0
60 stars 53 forks source link

Object metadata and download methods #213

Closed sarpera closed 5 years ago

sarpera commented 5 years ago

Background

Following the discussion we had at the GA4GH hackathon in January we would like to propose to have a method to get the metadata of an object, and then have an additional method which will provide the download of the object.

The rationale to have two methods instead of one, is due to the necessity to sign the object using the authorisation token provider (right now this is based on the OIDC specs), which is expensive computationally to do. More over, with the presence of regions and provider, a DRS client will be able to decide which provider and which region would be best to obtain the file, among all the possible URIs.

The format we propose are:

and we propose to pass the authorisation token in the Request Header to get access to the object.

This is the flow, from a DRS client point of view:

1) GET /objects/<id>

2) GET /objects/<id>/download with Request Header X-DRS-TOKEN: <TOKEN>

The token is obtained by the client from the DRS server, and it is up to the DRS Server implementer to decide how a user will obtain that.

Object metadata Request

This will return the object metadata:

HTTP Request

GET /objects/<id>

HTTP Response

{
  "object": {
    "id": "string",
    "name": "string",
    "size": "string",
    ...
    "urls": {
      "cloud": [
        {
          "uri": "s3://<foo>/<bar>.bam",
          "region": "us-east-1",
          "provider": "aws"
        },
        {
          "uri": "gs://<foo>/<bar>.bam",
          "region": "us-west1",
          "provider": "google"
        }
      ],
      "ftp": [
        {
          "uri": "ftp://foo.com/bar.bam"
        }
      ],
      "drs": [
        {
          "uri": "drs://foo.com/objects/<id>"
        }
      ]
    },
    "aliases": [
      "doi://123/abcd"
    ]
  }
}

The client will be able to pick one of the cloud uri and request the download uri, passing the token

Object download Request

HTTP REQUEST

GET /objects/<id>/download?type="cloud"&uri="gs://<foo>/<bar>.bam" 

    Request Header: 
    X-DRS-TOKEN: <TOKEN>

HTTP Response

The return value is a URI where a GET request will give you the bytes:

{
  "uri": "<URL_TO_BYTES>"
}

a GET <URL_TO_BYTES> will start the download of the file.

susheel commented 5 years ago
  • re "_Is there a use case where a user will only need the access_id_" -- no, I don't see one. You have to turn the id into a URL somehow in order to fetch object content.

@dglazer If there isn't a use-case for only providing an access_id and the user MUST use /access to get the bytes; You might as well make this explicit in spec (not via comments) but by something like an explict access_url which service providers can use to configure different mechanisms to obtain the token.

Happy to commit to whichever way the community would like to proceed.

sarpera commented 5 years ago

PR is made. Kept is as simple as possible for the initial merge.

Points not covered in PR:

dglazer commented 5 years ago

I believe #236 and #248 cover all the main points in this discussion, so am closing it. Let's start a new issue if there are any loose ends.