Closed bussec closed 2 years ago
@bussec wondering what the rationale is for having a map
and a provider
and how CURIEMap
is intended to work with InformationProvider
.
If I understand correctly, CURIEMap
with the map
field essentially replicates what we used to have. It allows you to resolve a CURIE with a resolved IRI to get a mapping...
Is the intent of the provider
to denote services that can provide structured, machine readable responses, as opposed to map
which provide basic lookup and takes you to a web page?
If I am understanding things correctly, it looks like ROR
in InformationProvider
is incorrect. It is:
ROR:
request:
url: "https://api.ror.org/organizations/{iri}"
response: application/json
But I think it should be:
ROR:
request:
url: "https://api.ror.org/organizations/{local_id}"
response: application/json
Similar to the ORCID entry.
And I think ORCID returns XML, not JSON.
Holding off making any changes until I confirm I understand correctly 8-)
@bcorrie Trying to address all of your questions below:
CURIEMap
is supposed to expand a CURIEs to a full IRI, which can serve as unique Identifier of a concept/instance.InformationProvider
is supposed to specify how a machine-readable record of the concept/instance can be retrieved. Preference will be given to JSON format, but this might not always be available (indicated via request.response
). Unless defined otherwise, it is assumed that the final URI will be used to perform an HTTP GET request without any additional information.https://api.ror.org/organizations?query={iri}
as request.url
. I would prefer to use IRIs instead of local IDs whenever possible as it is closer to the way how they are used in linked data.Accept
field in the HTTP header. I assume that this is the perferred format for us.@bussec do you want to have a look at my changes and provide some feedback. In particular, wondering about the terminology in the intro section where I talk about CURIEMap
and InformationProvider
@bcorrie Thanks, looks good. I realized that with the clear split between CURIE mapping and data retrieval, the term provider has become fuzzy, as it can either refer to an
iri_prefix
), or anurl
).Maybe Authority is the better term for the former one... will ponder upon this when writing the report.
@bussec Can I merge this PR, or still working on it?
@schristley No, I am still working on it... hope to complete it by the end of the week.
IEDB has released a beta query API for their database.
Of interest is they have a curie_map endpoint which returns a structure similar to our CURIEMap
. I've noticed however that the IRI goes to the human-readable html page instead of the API which returns machine-readable JSON. For example, IEDB_EPITOPE: 7355
resolves to:
https://www.iedb.org/epitope/7355
versus
https://query-api.iedb.org/epitope_search?structure_id=eq.7355
The nice thing is that now we should be to add a single field in Rearrangement if we want to link to an epitope #44
Though that opens the question on whether we should put entries into our CURIEMap for resolving IEDB_EPITOPE
or if we should use IEDB's?
We also can consider how we might link with receptors. The IEDB API has tcr_search
and bcr_search
endpoints:
https://query-api.iedb.org/tcr_search?limit=5
https://query-api.iedb.org/bcr_search?limit=5
The receptor ids have IEDB_RECEPTOR
as their CURIE though oddly it's missing from the above curie_map. However, we might want to consider how we can link our Receptor
to IEDB's.
This is very nice... Should we take this up on #44 as to if/how to add this?
Cool, if I find a CDR3 of interest in an AIRR-seq data set, I can ask IEDB if it has any known antigen specificity...
curl https://query-api.iedb.org/tcr_search?receptor_chain2_cdr3_seq=eq.ASSPPGLSQSYGYT
@bcorrie @schristley
This is very nice... Should we take this up on #44 as to if/how to add this?
Receptor
objects I now created #540.Note to self: Make sure to that #465 is included in here (or at least not in conflict.
Recent discussion at the json schema org about ontologies, including a reference to us. The Human Cell Atlas example is interesting as it support multiple ontologies and even specifies the relation,
@bussec are we comfortable with the CurieMap
and InformationProvider
objects in the Spec? We are working on an ADC ontology checker that will use the above to validate a NCBITAXON:9606 style of CURIE in a repository. I don't want to develop code against something that is going to change dramatically. I will probably restrict the ontology checker to use the OLS provider (at least for now).
@bcorrie IMO yes... I assume you could cope with a field still changing its name as long as the overall structure is not affected, correct?
@bcorrie IMO yes... I assume you could cope with a field still changing its name as long as the overall structure is not affected, correct?
Minor changes good, major changes bad... 8-)
One question - the CurieMap and InformationProvider objects are not "Objects" defined in the same way that the other objects are - that is with a
CURIEMap:
discriminator: AIRR
type: object
properties:
...
If we have the above for both of these objects, then when you use the AIRR python library, you automatically can access these objects using the AIRR Schema class. That is if you include the AIRR library you can do this:
# Import AIRR Schema class
from airr.schema import Schema
# Get the schema object for CURIEMap
curiemap_schema = Schema('CURIEMap')
# Process the object as you see fit - in this case get the IRIs
for curie_prefix, values in curiemap_schema.properties.items():
if values['type'] == 'ontology' or values['type'] == 'taxonomy':
ontology_iri_dict[curie_prefix] = values['map']['OBO']['iri_prefix']```
No handling of AIRR Spec files and processing them - the AIRR library does it for you.
The problem is that the AIRR library expect this basic form for all AIRR Spec objects.
This is simple to add (I already have in my local copy) and I can push if you agree...
I just wrote some very basic code to check Ontology labels, based on how the AIRR Spec defines ontologies. It uses the AIRR python library and the modified (as above) CurieMap and InformationProvider to build a OLS/OBO query, and then checks the results: https://github.com/sfu-ireceptor/sandbox/tree/master/ontology-check
$ python3 airr-onotlogy.py NCBITAXON:9606 "homo sapiens"
ERROR: Invalid CURIE/label: NCBITAXON:9606, homo sapiens, correct label = Homo sapiens
$ python3 airr-onotlogy.py NCBITAXON:9606 "Homo sapiens"
Valid CURIE and label: NCBITAXON:9606, Homo sapiens
$ python3 airr-onotlogy.py DOID:0080600 COVID-19
Valid CURIE and label: DOID:0080600, COVID-19
First steps toward an ADC Ontology checker... 8-)
One question - the CurieMap and InformationProvider objects are not "Objects" defined in the same way that the other objects are - that is with a
@bcorrie That's correct. The other AIRR objects are schema definitions, kind of like a Class in OO-programming while CURIEMap and InformationProvider are instances, that is they contain actual data instead of defining the structure. We can have a schema definition for CURIEMap, it would look something like this. It looks a bit odd because the keys are not pre-defined (thus the additionalProperties
). But remember, you'd still another need object that actually holds the data.
CURIEMap:
discriminator: AIRR
type: object
additionalProperties:
type: object
properties:
type:
type:string
default:
type: object
properties:
map:
type:string
provider:
type:string
map:
type: object
additionalProperties:
type: object
properties:
iri_prefix:
type: string
No handling of AIRR Spec files and processing them - the AIRR library does it for you.
The objects are similar to Info
, so they do need to be handled explicitly by the AIRR library, regardless of whether a schema definition is defined or not.
It would be nice to add a simple resolve
-like function to the AIRR library so users don't have to write their own, at least if they use R or python. That would also help insulate users from changes.
There's actually an issue to validate ontology fields #503 which would imply that the AIRR library knows how to resolve the CURIE to check.
There's actually an issue to validate ontology fields #503 which would imply that the AIRR library knows how to resolve the CURIE to check.
I think #503 is to check the validity of a CURIE in terms of its format - that is, it is colon separated and the CURIE prefix exists in the schema, but not whether the CURIE itself is a valid CURIE for that Ontology, correct?
The code I have can do that, and we can probably reuse some of it in the AIRR Library, but my goal with the code above is to create a CURIE checker for the content of an ADC repository.
@bcorrie That's correct. The other AIRR objects are schema definitions, kind of like a Class in OO-programming while CURIEMap and InformationProvider are instances, that is they contain actual data instead of defining the structure.
Right, I remember having this discussion earlier, thanks for the reminder - perhaps we should document that in the Spec itself so it is clear to people like me who forget 8-) I wonder - should we have object definitions for these object instances - then we could use the AIRR schema to confirm the validity of those objects?
@bussec did you do some sort of weird merge/push (force-push) above... When I do a pull of my local copies of the ontovoc-4-21 branch are complaining that I have some minor conflicts in a doc file. When I fix the 1 conflict file, it says I have 17 local changes - when I haven't actually changed anything.
This is on a Linux box, but I had the same situation when I tried to access this on my desktop through Sourcetree...
@bcorrie Yes, I rebased the ontovoc-04-21
branch onto the current tip of master
to reduce potential conflicts as this branch hasn't seen any work for 8 months. This should however not create any conflicts on your side, unless you had changes that were not pushed yet. Please PM me the details.