Closed bcorrie closed 2 years ago
@bcorrie Your summary is correct, just note that the ALLCAPS prefix is an AIRR convention, it is not part of the W3C TR. Futhermore, the TR does not make any statements on case-insensitivity, therefore I assume that this means that prefix-matching is case-sensitive. As the key purpose of using CURIEs is to abstract the actual IRI we should use only one capitalization-scheme per prefix.
What's the action item on this? Is there anything for v1.4 here?
@javh No, this is an issue of a repository in the ADC, IMO the expected standard is clear. If not, we need more documentation, but that's something for v2.0.
nothing to be done by airr-standards
@javh No, this is an issue of a repository in the ADC, IMO the expected standard is clear. If not, we need more documentation, but that's something for v2.0.
validating resolution of ontology terms within a DataFile would be a nice option/enhancement.
I agree no action for v1.4 - this was pre the rule about no issues until 1.4 is released (I think 8-)
We can reuse our code in the AIRR python library if we want, and make it an option. Just to point out that it does actually query OLS and check the ontology terms, so it is a pretty substantial operation...
Also, it does strict checking in the sense that it checks the label from OLS for an exact string match, so it does not consider label aliases and does not accept case differences. So a label of "Human" or "homo sapiens" for "NCBITAXON:9606" will fail because the strict match is "Homo sapiens"
@bussec and @schrsitley it is my understanding that the correct CURIE format is for the CURIE prefix to be defined in the AIRR Spec, and that prefix is a "precise" definition and maps to other possible uses in various IRIs depending on the provider. For example the AIRR CURIE prefix in the spec for the NCBI Taxonomy is
NCBITAXON
all upper case. So a correct CURIE would beNCBITAXON:9606
.That sometimes maps to
NCBITaxon
in some providers, but an AIRR CURIE ofNCBITaxon:9606
would be incorrect I think? Do I have that correct?On the upcoming v4.0 release of the Gateway (using AIRR v1.4, ADC API v1.2), we are now searching on Taxonomy IDs for accuracy (rather than the possibly ambiguous label) and currently this is an exact string match. So something with a CURIE prefix that doesn't match the exact CURIE prefix string will not be found. I think this is the correct behaviour but wanted to confirm.
@schristley we are seeing CURIEs with either
NCBITaxon:9606
as well as the older styleNCBITaxon_9606
on VDJServer. This will cause problems when searching non-compliant ontology fields (assuming these are non-compliant) as the query will not match. Not sure how wide spread this is. Today (iReceptor v3.0) we are searching on the ontology label, and long ago we fixed the mis-matched label fields in all of our repositories. But it looks like we didn't do this for the Ontology IDs? Assuming I am correct above, I am hoping you can correct these???@schristley we have some ontology checking code (https://github.com/sfu-ireceptor/sandbox/tree/master/ontology-check) that you can use to check your repository for cases like this. It did indeed find some non-compliant ontology CURIEs:
It looks like there are probably a couple of studies that maybe have the different CURIEs?