Open erikyao opened 3 years ago
@DylanWelzel obonet
now on v1.0.0. Let's re-evaluate if we still need to keep our local fix for this issue.
obonet
v1.0.0 still does not correctly parse the def
field. The local fix will stay but I've updated the obonet
version to v1.0.0
in the requirements.
Version
Related To
Priority
Low. Currently it's not an issue. Maybe an issue in the future.
Problem
The
def
field of obo ontology has a format of<def_string> [<dbxref>]
. See GO.format.obo-1_4.html#S.2.2.Library
obonet
will read such a field incorrectly into a whole string. E.g.However the the
def
fields within the current ChEBI obo file all have empty<dbxref>
lists. Our current implenentation is to trim them from the string values ofdef
fields. E.g.will be trimmed to
Note that the quotes inside will also be removed.
Our current implementation cannot handle any
def
field with a non-empty<dbxref>
list.Solution
pronto
is another library to read obo files. It's more heavy-weight yet low-level. It has a clear class hierarchy but at the same time not well-documented. An alternative implementation to theOntologyReader
inchebi_parser.py
is ProntoOntologyReader.py.Performance-wise:
pronto
is about 4-times slower thanobonet
rel201/chebi_lite.obo
,ProntoOntologyReader
uses ~150 seconds to load the file and generate all 146,183 documents, while our implementation withobonet
uses only ~30 seconds.pronto
uses slightly more memory thanobonet
We can also watch for the update of
obonet
on this issue.