Open matentzn opened 6 months ago
The KeyError
is thrown by the symmetrize_lineage
method in the pronto.parsers.base.BaseParser
class:
def symmetrize_lineage(self):
for getter in self._entities.values():
entities, graphdata = getter(self.ont)
for entity in entities():
graphdata.lineage.setdefault(entity.id, Lineage())
for subentity, lineage in graphdata.lineage.items():
for superentity in lineage.sup:
graphdata.lineage[superentity].sub.add(subentity)
which is itself called at the end of the OBO parser parse_from
method:
def parse_from(self, handle, threads=None):
[…]
# Update lineage cache with symmetric of `SubClassOf`
self.symmetrize_lineage()
Overall, it seems there is an assumption here that when a class is a subclass of another, the parent class must exist somewhere in the graph. This does not take into account the possibility of dangling is_a
references, which are explicitly acknowledged by the OBO specification (§6.1.2) – and for which the OBO Flat File Format Guide recommends (§S.3.4) that they should be silently accepted without yielding an error.
Potential duplicate with #225
We have the following problem in the latest Mondo release:
When running:
is causing:
When running:
Everything is all good.
When removing the isa statement above:
Everything is good as well:
As there are thousands of dangling classes in mondo.obo - what seems to be the problem?