ga4gh / data-repository-service-schemas

A repository for the schemas used for the Data Repository Service.
Apache License 2.0
60 stars 53 forks source link

GTEX DRS ids double up the prefix #340

Open ianfore opened 3 years ago

ianfore commented 3 years ago

DRS ids for GTEX files show as follows for example. drs://dg.ANV0:dg.ANV0/d7428364-7bab-4126-8162-0b9b549b649e

The doubling up of the "prefix" seems less than optimal and is not what I believed the prefix discussion for 1.1 was leading to. Perhaps we need a review of the approaches to prefixing, ids and resolution

briandoconnor commented 3 years ago

I spoke with Jiaqi in late 2020 about this, it's their plan to eventually remove the double IDs... not a DRS issue, its an issue with the implementation. Want to file a bug with U. Chicago's repo?

ianfore commented 3 years ago

Understand that it's the implementation and not the spec. Two comments

On the last point. #341 is another example where an implementation did something that is within spec, but tells us something about the spec.

ianfore commented 3 years ago

I didn't note where I got the DRS id at the start of this issue that showed the original duplication. However, the self_uri in the DRS response from Anvil DRS for d7428364-7bab-4126-8162-0b9b549b649e currently shows as drs://gen3.theanvil.io/dg.ANV0/d7428364-7bab-4126-8162-0b9b549b649e

So no duplication of the prefix. However it introduces another kind of duplication - using both the host name and a prefix in the same URI. Host name based URIs and Compact URIs (prefix based) were intended to be alternatives. In this case the host name based URI would be a)drs://gen3.theanvil.io/d7428364-7bab-4126-8162-0b9b549b649e and the prefix based URI would be b) drs://dg.ANV0:d7428364-7bab-4126-8162-0b9b549b649e

In both cases the URI would be resolved to send to the following URL c) https://gen3.theanvil.io/ga4gh/drs/v1/objects/d7428364-7bab-4126-8162-0b9b549b649e In the case of a) the URI can be resolved to the URL locally. In the case of b) a metaresolver is needed to obtain the host name. For performance, local metaresolvers are allowed, as is caching of prefix to host mapping.

Note that URL c) currently works - even though it does not include dg.ANV0 anywhere within it. Some changes seem to have been made to Gen3 DRS servers, perhaps as part of the actions Brian mentioned above. These changes appear to apply to the CRDC and BioDataCatalyst and Gen3 DRS servers too, So practically everything is in place for this to work as intended. What remains is:

I have updated the example local DRSMetaResolver with the data necessary to resolve three Gen3 prefixes (dg.ANV0, dg.4DFC and dg.4503).