biothings / mydisease.info

7 stars 8 forks source link

BUG: Limit `parents`, `children`, `ancestors`, and `descendants` in `Mondo` document to `is_a` ontology nodes #44

Closed erikyao closed 1 year ago

erikyao commented 3 years ago

See the discussion in mychem.info: missing ChEBI records?:

The issue boils down to take ONLY is_a relationships when we try to populate fields parents, ancestors, children, descendants by walking the ontology tree.

The following code can be added to the Mondo parser (line 35) to fix this issue:

# note that:
#     graph.edges() returns (u, v) tuple collection
#     graph.edges returns (u, v, data) triple collection
edges_to_remove = [(u, v) for (u, v, data) in graph.edges if data != "is_a"] 

# Void, in-place operation. Copy the old graph if necessary.
graph.remove_edges_from(edges_to_remove) 
andrewsu commented 2 years ago

@erikyao Just ran into this issue again -- can you look into adding / testing / deploying the fix you suggest above? On quick glance, seems like a reasonable quick fix. (Though I think ideally we would move those non-is_a relations to another key in the document, as described in my linked comment above...)

The specific example I'm looking at is http://mydisease.info/v1/disease/MONDO:0005260 for autism. We incorrectly list these four children:

All of these are from the "term relations" part of https://www.ebi.ac.uk/ols/ontologies/mondo/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMONDO_0005260

image

andrewsu commented 1 year ago

bumping this as a high priority fix given effect on Translator results

erikyao commented 1 year ago

Though I think ideally we would move those non-is_a relations to another key in the document, as described in my linked comment above...

Looks like we did use those non-is-a relationships in the parser, see here.

Solution: preserve a copy of those non-is-a relationships before removing them from the calculation of parents, children, ancestors, and descendants.

erikyao commented 1 year ago

Fixed and deployed with PR https://github.com/biothings/mydisease.info/pull/54.

See MONDO:0005260 for a spot check.

andrewsu commented 1 year ago

Looks good!