Phenomics / ontolib

A modern Java library for working with (biological) ontologies.
https://ontolib.readthedocs.org
Other
9 stars 2 forks source link

get MICA #40

Closed pnrobinson closed 6 years ago

pnrobinson commented 6 years ago

I would like to have the following function in ontolib (the following is a first draft but is not ready for incorporation into the library as is). Actually to get the MICA we need to know the information content of the terms (the path length implemented below is not correct in all cases since it does not reflect information content faithfully). @holtgrewe Do you have any advice about where to go with this?

private int dijkstra2pathlength(TermId source, TermId target,DirectedGraph dag, Set<TermId> candidates) {
        SortedMap<TermId,Integer> vertex2distance= new TreeMap<>();
        for (TermId t : candidates) {
            vertex2distance.put(t,Integer.MAX_VALUE);
        }
        vertex2distance.put(source,0);
        TermId u = source;
        while (! vertex2distance.isEmpty()) {
            u = vertex2distance.firstKey(); // sorted map! This gets the least key (lowest distance)
            int dist_u = vertex2distance.get(u);
            int dist_v = dist_u + 1; // each vertex v has a path length of 1 to u
            vertex2distance.remove(u);
            java.util.Iterator<HpoTermRelation> it = dag.outEdgeIterator(u);
            HpoTermRelation edge = it.next();
            TermId vertex = edge.getDest();
            if (vertex.equals(target)) {
                // we found the path, and the overall length of the path if dist_v
                return dist_v;
            }
            vertex2distance.put(vertex, dist_v);
        }
        // We should never get here, but...
        logger.fatal("Should never happen--we failed to find shortest path although one must exist");
        return Integer.MAX_VALUE;
    }

and

public Pair<TermIdWithMetadata, Integer> getMICA(TermId queryTerm, Ontology<HpoTerm, HpoTermRelation> phenotypeSubOntology) {
        DirectedGraph dag = phenotypeSubOntology.getGraph();
        // want to get the collection of vertices that is all of the ancestors of our query term and all of the
        // ancestors or the disease terms.
        ImmutableSet.Builder isb = new ImmutableSet.Builder();
        isb.addAll(phenotypeSubOntology.getAncestorTermIds(queryTerm));
        for (TermId ptid : phenotypicAbnormalities) {
            isb.addAll(phenotypeSubOntology.getAncestorTermIds(ptid));
        }
        ImmutableSet<TermId> allAncestors = isb.build();

        for (TermIdWithMetadata ptid : phenotypicAbnormalities) {
            if (phenotypeSubOntology.getAncestorTermIds(ptid).contains(queryTerm)) {
                // the query term is an ancestor of this disease annotation
                // we now want to know the path length that separates the two terms.
                // we know that the path starts at ptid and ends at query term since
                // is_a links point from descendant to ancestor
                // we have found a MICA that is an ancestor of both the query term and the disease terms
                // Use Dijkstra to get path length
                int k = dijkstra2pathlength(ptid, queryTerm, dag, allAncestors);
                return new Pair<>(ptid, k);
            }
        }
        return null;
    }
drseb commented 6 years ago

This is already implemented.

pnrobinson commented 6 years ago

OK I see here https://ontolib.readthedocs.io/en/latest/tutorial_similarity.html

drseb commented 6 years ago

sorry, wasn't supposed to sound unfriendly. just wanted to avoid, that you start implementing this again and was only on mobile. I see you found it on your own.

pnrobinson commented 6 years ago

Thanks, I know you often reply from mobile, no problem :-0 I will close this issue