ga4gh-metadata / SchemaBlocks

Building Blocks and Schemas for GA4GH Implementations
https://ga4gh-metadata.github.io/schemas/
2 stars 0 forks source link

Status of CURIEs in SchemaBlocks #10

Open cmungall opened 5 years ago

cmungall commented 5 years ago

The docs mention use of CURIEs for ontology classes, but there are some invalid CURIEs used in this context, e.g

https://github.com/ga4gh-metadata/SchemaBlocks/blob/4cc04a8c16b672e068d41fc95b3200233e128bfa/main/json/biosample-Biosample-example.json#L54

In other cases, the CURIE has been lowercased from it's standard form:

https://github.com/ga4gh-metadata/SchemaBlocks/blob/4cc04a8c16b672e068d41fc95b3200233e128bfa/main/json/biosample-Biosample-example.json#L67

CURIEs/URIs are case-sensitive so it's important to use the standard form. I realize there is inconsistency between OBO and identifiers.org with the former using NCIT and the latter using ncit but nobody lowercases the leading c in the local fragment portion.

Also, why not extend the use of CURIEs for all identifiers, rather than just ontology classes?

mbaudis commented 5 years ago

@cmungall Thanks for this. The wrong case example needs a fix (it is actually in our data where we pull the examples from). Not sure about the problem with the pgx example, since this is a registered namespace at identifiers.org and therefore would point to a scoped class (though in this case both w/o working link right now & also wrong case - but in "form" should be correct, IMHO?).

I'll fix this, but there will be a new repository :-)

cmungall commented 5 years ago

If you follow the W3 CURIE spec into https://www.ietf.org/rfc/rfc3987.txt you'll see that (I think) no double ":" allowed in a CURIE (from my reading of the BNF)

I wish identifiers.org would get rid of the double colon IDs, like MGI:MGI:nnn

ianfore commented 5 years ago

Just noticed the unregistered prefix in the SchemaBlocks examples. The same example when it shows up here does not use the pgx prefix - just icdot:c25.9. icdot doesn't resolve at n2t.org or identifiers.org. They have icd: registered as the prefix for ICD. (That leaves open the question of how to prefix an ICD-9 vs ICD-10 code though).

Also the example id registered with identifiers.org for pgx is broken pgx:icdom:8500_3

Also, why would one resolve ICD codes via a repository for experimental data rather than via a system of record for those codes.

icd:C25.9 should work, but that's broken too. That seems to be an identifiers.org problem. It should resolve to http://apps.who.int/classifications/icd10/browse/2010/en#/C25.9 - but it doesn't.

Agree with using CURIEs for things other than ontologies.

mbaudis commented 5 years ago

@ianfore Correct, and we should replace this ASAP. Goes back (and was actually used to illustrate) the „well, we should use ontologies and CURIES, but get sadly disappointed when doing so, so here is an example using a custom representation of ICD–O 3...“ And then broke the pgx mapping at our backend. So, yes, apologies, will be rewritten to ncit examples ASAP. And anybody is welcome to PR nice examples. But then, it says „draft“ ...

ianfore commented 5 years ago

DUO is also not a registered CURIE prefix. Seems like a good candidate for registration.

ianfore commented 5 years ago

@mbaudis . All understood - it's a good vision. Just highlighting some things we need to work on if this is going to work.

mbaudis commented 5 years ago

DUO is also not a registered CURIE prefix. Seems like a good candidate for registration.

Pinging @mcourtot ...

ianfore commented 5 years ago

Also pinging @sarala

mbaudis commented 5 years ago

@ianfore @cmungall O.k., picking this up again: So a public prefix (pgx) followed by a local identifier using an internal prefix syntax (icdom:85003) is wrong? While this means an internal representation derived from an existing classification (ICD-O 3 in its various flavours), there is no public icdom prefix - this is just internal. And there are definitions allowing concatenated prefixes - bad advice?

mbaudis commented 5 years ago

@ianfore Just a note about a fix (thanks!):

... resolve again. Still ignorant about the use of multiple prefixes (i.e., private after public). Good topic for documentation by experts :-)

cmungall commented 5 years ago

To be a valid CURIE there should be a single :. The expectation is that there is an agreed upon prefix list. Usually with W3C standards the prefixmap is part of the document in which the CURIE is embedded.

sarala commented 5 years ago

Hi All,

Identifiers.org has pgx and icd registered. Unfortunately, we have not been able to avoid the multiple colon problem for pgx. pgx:ncit:C2930 pgx:icdom:85003 pgx:icdom:8170_3 icd:C25.9

Please fill in our prefix registration form to add a new prefix to the identifiers.org registry.