dhimmel / obonet

OBO-formatted ontologies → networkx (Python 3)
https://github.com/dhimmel/obonet/blob/main/examples/go-obonet.ipynb
Other
136 stars 28 forks source link

Extract replaced_by mappings for obsolete terms #18

Closed csbayrak closed 3 years ago

csbayrak commented 3 years ago

Hi, I'd like to use obonet to read HPO terms (http://purl.obolibrary.org/obo/hp.obo) I noticed that the obsolete terms are ignored when reading the file. I'd prefer to use their corresponding updated names.

Example:

[Term]
id: HP:0000547
name: obsolete Tapetoretinal degeneration
synonym: "Retinotapetal degeneration" EXACT []
is_obsolete: true
replaced_by: HP:0000510

Is there an easy way to extract and use that mapping?

dhimmel commented 3 years ago

https://github.com/dhimmel/obonet/pull/15 added an ignore_obsolete option to read_obo. Setting ignore_obsolete=False will retain these nodes.

I don't think ontologies generally include edges for obsolete nodes, but I'm not entirely sure.

Does this address your use case? Are you trying to do any of the following:

  1. replace obsolete HP terms with their replacements?
  2. lookup names for obsolete HP terms?
csbayrak commented 3 years ago

Thank you for your reply! I'm interested in getting descendants of a list of HP terms. If it is obsolete, I'd like to get the descendants of its replacement. I guess (1) makes sense.

dhimmel commented 3 years ago

I'm interested in getting descendants of a list of HP terms. If it is obsolete, I'd like to get the descendants of its replacement.

Sounds like you need a two step approach:

  1. replace obsolete terms in your query list with their replacements (if they have replacements, not all obsolete terms do)
  2. find descendants of the non-obsolete terms

In https://github.com/dhimmel/obonet/commit/0ce7c81d7206ada2bff5204f3158847aa195dfe2, I update the example notebook with an example of how to create an old_to_new dictionary to map from obsolete terms to replacements. Here is the key code:

old_to_new = dict()
for node, data in graph_with_obs.nodes(data=True):
    for replaced_by in data.get("replaced_by", []):
        old_to_new[node] = replaced_by

Hopefully this helps you achieve part 1. There is also code for part 2 in the notebook.

Note that in obonet directed edges go from child to parent term, such that the networkx descendants and ancestors functions do the opposite of their usual behavior. You could always reverse all the edges using networkx.MultiDiGraph.reverse if this becomes annoying.

csbayrak commented 3 years ago

Perfect! Thank you very much for your help. This is so helpful!