Phenomics / ontolib

A modern Java library for working with (biological) ontologies.
https://ontolib.readthedocs.org
Other
9 stars 2 forks source link

Are terms in Termmap unique? #34

Open pnrobinson opened 6 years ago

pnrobinson commented 6 years ago

I am getting a strange error from the termmap function of ontolib. My code is something like this

Ontology<HpoTerm, HpoTermRelation> ontology=null;
 HpoOntologyParser parser = new HpoOntologyParser(pathToHpoOboFile);
 try {
    parser.parseOntology();
    ontology = parser.getPhenotypeSubontology();
} catch(IOException) {
// no problems occur
}
ImmutableMap.Builder<String,HpoTerm> termmap = new ImmutableMap.Builder<>();
ontology.getTermMap().values().  forEach(term -> termmap.put(term.getName(), term));
return termmap.build();

This causes the following error (excerpt):

Caused by: java.lang.IllegalArgumentException: Multiple entries with same key: Micropenis=HPOTerm [id=ImmutableTermId [prefix=ImmutableTermPrefix [value=HP], id=0000054], altTermIds=[ImmutableTermId [prefix=ImmutableTermPrefix [value=HP], id=0000038]], name=Micropenis, definition=Abnormally small penis. At birth, the normal penis is about 3 cm (stretched length from pubic tubercle to tip of penis) with micropenis less than 2.0-2.5 cm., comment=null, subsets=[], synonyms=[ImmutableTermSynonym [value=Short penis, scope=EXACT, synonymTypeName=layperson, termXrefs=[]], ImmutableTermSynonym [value=Small penis, scope=EXACT, synonymTypeName=layperson, termXrefs=[]]], obsolete=false, createdBy=null, creationDate=null, xrefs=[ImmutableDbxref [name=SNOMEDCT_US:34911001, description=null, trailingModifiers=null], ImmutableDbxref [name=UMLS:C0266435, description=null, trailingModifiers=null]]] and Micropenis=HPOTerm [id=ImmutableTermId [prefix=ImmutableTermPrefix [value=HP], id=0000054], altTermIds=[ImmutableTermId [prefix=ImmutableTermPrefix [value=HP], id=0000038]], name=Micropenis, definition=Abnormally small penis. At birth, the normal penis is about 3 cm (stretched length from pubic tubercle to tip of penis) with micropenis less than 2.0-2.5 cm., comment=null, subsets=[], synonyms=[ImmutableTermSynonym [value=Short penis, scope=EXACT, synonymTypeName=layperson, termXrefs=[]], ImmutableTermSynonym [value=Small penis, scope=EXACT, synonymTypeName=layperson, termXrefs=[]]], obsolete=false, createdBy=null, creationDate=null, xrefs=[ImmutableDbxref [name=SNOMEDCT_US:34911001, description=null, trailingModifiers=null], ImmutableDbxref [name=UMLS:C0266435, description=null, trailingModifiers=null]]]

i.e., there are multiple copies of term HP:0000054.

If I try the following code to remove multiple copies

(...the same...)
List<HpoTerm> res = ontology.getTermMap().values().stream()
                    .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream()
                    .filter(e -> e.getValue() == 1)
                    .map(e -> e.getKey())
                    .collect(Collectors.toList());
res.forEach( term -> termmap.put(term.getName(),term));
return termmap.build();

then everything is fine. I am not sure if this is a bug in the ontology.getTermMap() function?

holtgrewe commented 6 years ago

Can you make a "MWE", i.e. a .java file that I can compile and see the error?

pnrobinson commented 6 years ago

I sent an MWE via email/googledrive, let me know if you got it.