Phenomics / ontolib

A modern Java library for working with (biological) ontologies.
https://ontolib.readthedocs.org
Other
9 stars 2 forks source link

? Bug in getPhenotypicAbnormalitySubOntology(); #28

Open pnrobinson opened 7 years ago

pnrobinson commented 7 years ago

The function getPhenotypicAbnormalitySubOntology() should be returning an ontology that just has terms from this subontology. And yet it apparently returns the inheritance terms. On the other hand, the function Set ancestors = ontology.getAllAncestorTermIds(myset); crashes if myset contains the Term HP:0000006, Autosomal doimant inheritance. I am not sure I understand this issue, but here is some code that can be used to reproduce the problem I am seeing. If you use the code, then path and hpopath need to be modified accordingly.


package org.monarchinitiative.lr2pg.likelihoodratio;

import com.github.phenomics.ontolib.formats.hpo.HpoDiseaseAnnotation;
import com.github.phenomics.ontolib.formats.hpo.HpoOntology;
import com.github.phenomics.ontolib.formats.hpo.HpoTerm;
import com.github.phenomics.ontolib.formats.hpo.HpoTermRelation;
import com.github.phenomics.ontolib.io.obo.hpo.HpoOboParser;
import com.github.phenomics.ontolib.ontology.data.*;
import org.junit.Test;
import org.monarchinitiative.lr2pg.prototype.Disease;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;

public class HPO2LRTest {
    private String path = "/home/robinp/data/hpo/phenotype_annotation.tab";
    private String hpopath="/home/robinp/data/hpo/hp.obo";
    private Ontology<HpoTerm, HpoTermRelation> ontology=null;
    /** Map of all of the Phenotypic abnormality terms (i.e., not the inheritance terms). */
    private Map<TermId,HpoTerm> termmap=null;

    /** List of all annotations parsed from phenotype_annotation.tab. */
    private List<HpoDiseaseAnnotation> annotList=null;

    private Map<String,Disease> diseaseMap=null;

    @Test
    public void findBadTerm() {
        // This initializes ontology (which should have just phenotype terms)
        parseHPOData(hpopath,path);
        // This gets the four TermIds for OMIM 613172 and puts them in a list
        List<TermId> ids = getTermsFromDisease();
        System.out.println(String.format("*** Looking for the %d ancestors of the TermId's used to annotate MIM:613172 ***",ids.size()));

        for (TermId id:ids) {
            System.out.println(String.format("Testing term id %s at %s",id.toString(),termmap.get(id).getName() ));
            if (this.termmap.containsKey(id)) {
                System.out.println(String.format("TermId %s (%s) was found in PhenotypicAbnormality subontology", termmap.get(id).getName(), termmap.get(id).getId()));
            } else {
                System.out.println(String.format("TermId %s was NOT found in PhenotypicAbnormality subontology", id.toString()));
            }
            Set<TermId> uniset = new HashSet<>();
            uniset.add(id);
            try {
                Set<TermId> ancestors = ontology.getAllAncestorTermIds(uniset);
            } catch (Exception e) {
                e.printStackTrace();
            }

        }
    }

    @Test
    public void testAutosomalDominantInheritance() {
        TermPrefix pref = new ImmutableTermPrefix("HP");
        TermId autosomalDominantId = new ImmutableTermId(pref,"0000006");
        parseHPOData(hpopath,path);
        System.out.println("About to test term: "+ autosomalDominantId.toString());
        Set<TermId> myset = new HashSet<>();
        myset.add(autosomalDominantId);
        try {
            Set<TermId> ancestors = ontology.getAllAncestorTermIds(myset);
            System.out.println(String.format("We extracted %d ancesters -OK!!",ancestors.size()));
        } catch (Exception e) {
            System.out.println("We crashed on Autosomal dominant");
            e.printStackTrace();
        }
    }

    private void parseHPOData(String HPOpath, String annotation) {
        HpoOntology hpoOntology;
        try {
            HpoOboParser hpoOboParser = new HpoOboParser(new File(HPOpath));
            hpoOntology = hpoOboParser.parse();
            this.ontology =  hpoOntology.getPhenotypicAbnormalitySubOntology();
            this.termmap=ontology.getTermMap();

        } catch (IOException e) {
            e.printStackTrace();
            System.exit(1);
        }
    }

    public List<TermId> getTermsFromDisease() {
        TermPrefix pref = new ImmutableTermPrefix("HP");
        List<TermId> idlist = new ArrayList<>();
        try {
            BufferedReader br = new BufferedReader(new FileReader(path));
            String line=null;

            while ((line=br.readLine())!=null) {
                if (! line.startsWith("OMIM"))
                    continue;
                String A[] = line.split("\t");
                String id=A[1];
                if (id.equals("613172")) {
                    String HPid = A[4];
                    if (HPid.startsWith("HP:")) {
                        HPid=HPid.substring(3);
                        TermId tid = new ImmutableTermId(pref,HPid);
                        idlist.add(tid);
                    }
                    //System.out.println(line + "\n\t"+HPid);

                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return idlist;
    }

}
pnrobinson commented 7 years ago

I think that getNonObsoleteTermIds(); is showing the expected behavior and that getMap is showing terms from the original ontology. This is unexpected behavior!

drseb commented 6 years ago

Not sure what you mean by the function crashes.

Here is a running example, is this doing what you were trying to do? It is not crashing

        HpoOboParser hpoOboParser = new HpoOboParser(new File(hpopath));
        ImmutableOntology<HpoTerm, HpoTermRelation> phenoSubOntology = hpoOboParser.parse().getPhenotypicAbnormalitySubOntology();
        Map<TermId, HpoTerm> termmap = phenoSubOntology.getTermMap();

        // test 1
        TermPrefix pref = new ImmutableTermPrefix("HP");
        TermId autosomalDominantId = new ImmutableTermId(pref, "0000006");
        System.out.println(termmap.get(autosomalDominantId));

        // test 2
        ArrayList<TermId> termids = new ArrayList<>();
        termids.add(autosomalDominantId);
        TermId anotherTermId = new ImmutableTermId(pref, "0000123");
        termids.add(anotherTermId);
        Set<TermId> ancestors = phenoSubOntology.getAllAncestorTermIds(termids);
        System.out.println(ancestors);

@holtgrewe I find it irritating that the the map produced by getTermMap (test 1) contains Terms that are not in the sub-ontology selected in the step before.

pnrobinson commented 6 years ago

@drseb The fact that "the map produced by getTermMap (test 1) contains Terms that are not in the sub-ontology selected in the step before" was indeed the problem that was leading to a difficult to debug crash (run time error). I am not sure if the test class crashes. I think that the problem is that a shallow copy is being done (i.e., the term map is copy the reference to the term map from the bigger ontology). It would be better to do a deep copy of the term map when we create a subontology.

pnrobinson commented 6 years ago

@drseb @holtgrewe I think I fixed the bug. The function subOntology in the class com.github.phenomics.ontolib.ontology.data.ImmutableOntology was passing the original termMap object, which contains all of the terms of the original ontology -- it should contain only the terms of the new subontology.

This is the revised function

  @Override
  public Ontology<T, R> subOntology(TermId subOntologyRoot) {
    final Set<TermId> childTermIds = OntologyTerms.childrenOf(subOntologyRoot, this);
    final ImmutableDirectedGraph<TermId, ImmutableEdge<TermId>> subGraph =
        (ImmutableDirectedGraph<TermId, ImmutableEdge<TermId>>) graph.subGraph(childTermIds);
    Set<TermId> intersectingTerms = Sets.intersection(nonObsoleteTermIds,childTermIds);
    final ImmutableMap.Builder<TermId,T> lb = ImmutableMap.builder();
    for (final TermId tid : intersectingTerms) {
      lb.put(tid,termMap.get(tid));
    }
    return new ImmutableOntology<T, R>(metaInfo, subGraph, subOntologyRoot,
      intersectingTerms,
        Sets.intersection(obsoleteTermIds, childTermIds), lb.build(), relationMap);
  }

This is a new test for com.github.phenomics.ontolib.ontology.data.ImmutableOntologyTest

 /**
   * The subontology defined by Term with id4 should consist of only the terms id4 and id1.
   * The termmap should thus contain only two terms. The subontology does not contain the original root of the ontology, id5.
   */
  @Test
  public void testSubontologyCreation() {
    ImmutableOntology<TestTerm, TestTermRelation> subontology=(ImmutableOntology<TestTerm, TestTermRelation>)ontology.subOntology(id4);
    Assert.assertTrue(subontology.getTermMap().containsKey(id4));
    Assert.assertTrue(subontology.getTermMap().containsKey(id1));
    Assert.assertTrue(ontology.getTermMap().size()==5);
    Assert.assertTrue(subontology.getTermMap().size()==2);
    Assert.assertFalse(subontology.getTermMap().containsKey(id5));
  }

I have not yet addressed whether the relationMap needs to be treated in this way also. All other unit tests still pass.