biopragmatics / biomappings

🗺️ Community curated and predicted equivalences and related mappings between named biological entities that are not available from primary sources.
https://biopragmatics.github.io/biomappings/
Creative Commons Zero v1.0 Universal
50 stars 12 forks source link

Add CLO processing notebook #133

Closed cthoyt closed 1 year ago

cthoyt commented 1 year ago

The Cell Line Ontology (CLO) is a detailed resource, however it does not follow standard OBO modeling pattern for cross-references that either a predicate from SKOS or oboInOwl:hasDbXref to point to a single CURIE encoded as a string. Instead, it uses rdfs:seeAlso with a combination of non-standard CURIEs that are either comma or semi-colon delimited.

Depends on:

cthoyt commented 1 year ago

@bgyori CLO kind of has mappings available, but they need serious processing effort to get at. How should this relate to biomappings? Should we import directly into the "positive" mappings file? Or should we put them in "predicted" mappings file then allow a second round of curation?

@matentzn I also wasn't sure what the right semapv tag was to tag things extracted from an ontology. In theory, they are manually curated, but there's no actual evidence of how they were done, so I don't think it's fair to assume.

matentzn commented 1 year ago

@cthoyt I have struggled with the same! I would keep the mapping predicate as oboInOwl:hasDbXref which already provides a warning sign, and record the semapv:UnspecifiedMatching as the mapping_justification.

bgyori commented 1 year ago

@cthoyt it looks like this PR adds xrefs from CLO as predictions and allows us to review and curate them manually to add them to mappings. I think this is appropriate under the assumption that a considerable number of these mappings are non-exact and therefore require review. If, however, we assume that the xrefs CLO provides are almost all actual exact equivalences then going through Biomappings shouldn't be necessary. In my cursory spot checking, to me it looks like these xrefs are exact mappings. So perhaps a better path forward would be to change CLO's representation to move these into proper xrefs rather than "see also" relations?

cthoyt commented 1 year ago

@bgyori, agreed, a lot of them that point to MeSH, BTO, and Cellosaurus are pretty high quality. Therefore, I moved the processing functionality into SeMRA.

I also added a many-to-many finder pipeline in SeMRA to assess the situation in CLO - I found 26 mappings that I want to manually curate and include in Biomappings as some must be non-exact.

For the rest of the mappings that could be exact, I asked in https://github.com/CLO-ontology/CLO/issues/103 if we can turn some of these into proper xrefs.

cthoyt commented 1 year ago

See also https://github.com/CLO-ontology/CLO/issues/104