ga4gh / fasp-scripts

Apache License 2.0
11 stars 7 forks source link

Add notebook to demonstrate DRS resolution and authentication #13

Closed ianfore closed 1 year ago

ianfore commented 3 years ago

See checkResolution() in https://github.com/ga4gh/fasp-scripts/blob/master/fasp/loc/drs_metaresolver.py That demonstrates CURIE resolution. Need to do the same for DRS URIs.

While the hackathon title suggests the objective is to produce a notebook in (Jupyter) the real point is to understand factors driving different approaches to different styles of DRS ids, including the use of prefixing.

The use of the two different styles of DRS ids is also something to explore. I'll revive a previous discussion of this and link or post it below.

ianfore commented 3 years ago

Here's my summary from September.

From: "Fore, Ian (NIH/NCI) [E]" forei@mail.nih.gov Date: Wednesday, September 9, 2020 at 9:29 AM

Re resolving the prefixes. I’d like to highlight two approaches which I think should be discussed. The first is already implemented by the getObject method in https://github.com/ianfore/FASPclient/blob/master/DRSMetaResolver.py but I have to associate the prefixes with the services in the constructor method. Could we put the short prefixes in the registry record? These prefixes are not yet official, so they would likely change.

The other approach is the host name based URIs. I should be able to add that by extracting the host name from the service url already in the registry.

The difference between these approaches has user impact.

Host name based URIs

drs://gen3.theanvil.io/dg.ANV0/737247da-f5da-49a7-86ec-737978eb8293
drs://gen3.biodatacatalyst.nhlbi.nih.gov/dg.4503/65f34e96-230a-4e20-b15d-8510d688cbf0
drs://nci-crdc.datacommons.io/dg.4DFC/ff59c94b-8124-48a8-8b78-72e71f5d71f0

Compact URIs (CURIEs)

anv:dg.ANV0/737247da-f5da-49a7-86ec-737978eb8293
bdc:dg.4503/65f34e96-230a-4e20-b15d-8510d688cbf0
crdc:dg.4DFC/ff59c94b-8124-48a8-8b78-72e71f5d71f0

This should even be possible – and is simpler

anv:737247da-f5da-49a7-86ec-737978eb8293
bdc:65f34e96-230a-4e20-b15d-8510d688cbf0
crdc:ff59c94b-8124-48a8-8b78-72e71f5d71f0

This might work too

dg.ANV0:737247da-f5da-49a7-86ec-737978eb8293
dg.4503:65f34e96-230a-4e20-b15d-8510d688cbf0
dg.4DFC:ff59c94b-8124-48a8-8b78-72e71f5d71f0
ianfore commented 3 years ago

Jeremy's approach in registry. I don't this this has been widely shared enough though it is upon working in Registry. DRSMetaresolver makes some use of it.

From: Jeremy Adams jeremy.adams@ga4gh.org Date: Wednesday, September 9, 2020 at 12:04 PM

I should be able to get the CURIE-based identifiers working. This would show that a compact identifier has meaning and effectively maps to a DRS object / URL using Service Registry. Here's what I propose to add on my end to get this working:

These are my favorites (simple PREFIX:ID structure): anv:737247da-f5da-49a7-86ec-737978eb8293 bdc:65f34e96-230a-4e20-b15d-8510d688cbf0 crdc:ff59c94b-8124-48a8-8b78-72e71f5d71f0 They should work well for the demo. If this works I will design the feature with this syntax in mind. We can also add other ID formats later.

ianfore commented 3 years ago

And the resolve endpoint. I'm not sure it's been properly looked at.

From: Jeremy Adams jeremy.adams@ga4gh.org Date: Wednesday, September 9, 2020 at 9:36 PM

Ok, I have the basic gist of this working. There's an API route: '/resolve-uri' to which you can pass some CURIE-style IDs, resulting in a JSON response with a single attribute containing the resolved url. Here are some examples:

SB CGC

Request: https://registry.ga4gh.org/v1/resolve-uri/sbcgc:5baa9d00e4b0abc1388b8ce0 
Response:
{
"resolvedURL": "https://cgc-ga4gh-api.sbgenomics.com/ga4gh/drs/v1/objects/5baa9d00e4b0abc1388b8ce0"
}

SB Cavatica

Request: https://registry.ga4gh.org/v1/resolve-uri/sbcav:5772b6ed507c1752674486fc
Response:
{
"resolvedURL": "https://cavatica-ga4gh-api.sbgenomics.com/ga4gh/drs/v1/objects/5772b6ed507c1752674486fc"
}

CRDC

Request: https://registry.ga4gh.org/v1/resolve-uri/crdc:f360253c-d7d7-47cb-947a-b26e0b41b800
Response:
{
"resolvedURL": "https://nci-crdc.datacommons.io/ga4gh/drs/v1/objects/f360253c-d7d7-47cb-947a-b26e0b41b800"
}

ANVIL

Request: https://registry.ga4gh.org/v1/resolve-uri/anv:737247da-f5da-49a7-86ec-737978eb8293
Response:
{
"resolvedURL": "https://gen3.theanvil.io/ga4gh/drs/v1/objects/737247da-f5da-49a7-86ec-737978eb8293"
}

BioData Catalyst

Request: https://registry.ga4gh.org/v1/resolve-uri/bdc:66eeec21-aad0-4a77-8de5-621f05e2d301
Response:
{
"resolvedURL": "https://gen3.biodatacatalyst.nhlbi.nih.gov/ga4gh/drs/v1/objects/66eeec21-aad0-4a77-8de5-621f05e2d301"
}

So a basic registry-based resolution is working with these 5 examples. I don't have tokens for any of these so I can't test out whether any follow-up request gives a valid DRS response, it would be great to see if this new method produces the same result as your current script.

I can also expand this out and provide more attributes in the resolution response (e.g. was resolution successful, what is the name of the service this mapped to, etc.). If there are any key attributes you'd like to see, please let me know.

ianfore commented 3 years ago

Need to add the BDC and Anvil different DRS ids example here. I have slides!

ianfore commented 1 year ago

This was essentially completed as part of the ISMB 2022 tutorial. See DRS id variants notebook