Closed lyschoening closed 3 years ago
@lyschoening nice implementation. I think it would make sense to add your read_obo
function to biosustain/obo
. If you open a pull request on biosustain/obo
, I'll leave the few comments I have on your implementation. This would result in two implementations to go from OBO to networkx.
While I like avoiding duplication of efforts, I think there'd be a valuable testing motivation. I'd like to see whether both packages produce the same networks. I'm not sure about your experience, but it's been hard to know whether my parsing is working... since there is no ground truth to test against. The greatest difficultly is likely in parsing node attributes.
It would be good not to be the only maintainer of the parser and for there to be a package that makes use of it.
I'm hesitant to switch obonet
to use biosustain/obo
's parsing, since I think your method is more complex (at least in lines of code). But if we find out that my approach is failing in ways that are not easy to rectify, I'd consider making biosustain/obo
a dependency.
As @cmungall pointed out, there's additional networkx logic that would be nice (specifying which relationship types you want to traverse). For this work, I agree that we should find a way not to duplicate efforts. If biosustain/obo
and obonet
produce the same networkx data structure and are tested against each other, that will be easy.
@dhimmel I would not want to add networkx to biosustain/obo
. After all, this is your work. The code is in this issue and is straightforward to test.
While I like avoiding duplication of efforts, I think there'd be a valuable testing motivation. I'd like to see whether both packages produce the same networks. I'm not sure about your experience, but it's been hard to know whether my parsing is working... since there is no ground truth to test against. The greatest difficultly is likely in parsing node attributes.
It is indeed quite difficult to parse tag values, which is why I was hoping to work together on a single parser implementation. Some of the complexity of my parser implementation comes down to trying to parse relevant tags into more accessible formats. So if you start looking at other tags such as the remaining relationship types, xrefs or subsets, you might find the package quite useful!
I went ahead and implemented a writer for biosustain/obo
. Using it on taxrank and SO helped me spot some of the remaining issues in the parser. The result is the identical bit-for-bit, which is a good enough baseline for me although there are certainly still some more issues left to iron out. I will attempt to publish a version of obo
this week and will keep an eye on your progress with obonet
.
The result is the identical bit-for-bit, which is a good enough baseline for me although there are certainly still some more issues left to iron out.
Awesome. The round-trip equivalence does provide a nice degree of confidence. You could even evaluate the round-trip for every OBO from OBOFoundry.
I will attempt to publish a version of
obo
this week and will keep an eye on your progress withobonet
.
Cool. My stance is to give obonet
and biosustain/obo
some time to mature in the wild. If enough people are using obonet
that it makes sense to continue developing it and biosustain/obo
become the de facto OBO parser for python, I'd be happy to outsource parsing to biosustain/obo
.
re: ground truth
convert the obo to obographs json using the java here: https://github.com/geneontology/obographs this can be regarded as the reference translation
it should be very straightforward to define a reference translation from obograph-json to networkx
FYI
We're getting ready to split out our python ontology access library from our flaskrest services layer: https://github.com/biolink/biolink-api/issues/49
We're thinking of a new repo and pypi module something like 'ontobio'. It's fairly different from obonet in that it doesn't even attempt obo parsing (it takes obojson, or it can connect to a SPARQL server and obtain the ontology from there; it can take obo as input but it calls a java program via shell which is a bit icky). It's aligned in that it defers most of its object model to networkx (although it does have some additional capabilities beyond what you would want to do with networkx, e.g. synonyms)
@cmungall great can't wait to try out biolink/ontobio
.
Ideally, any networkx traversal functionality I write in obonet
can also work on an ontobio graph (i.e. similar networkx structures). I'd like there to be an easy transition for migrating from reading OBO files to the JSON files.
Are you planning on updating the readme to document reading a JSON into networkx in Python? If so, I'll wait till then before playing around.
I don't know how much interest there is still in this, but I managed to implement a more complete OBO file parser for one of my own projects. It doesn't strictly check the cardinality of tags yet, but is able to parse all known tags down to its components using pyparsing. https://github.com/zachary822/oboparse
Thanks for the update!
Since the original post, I have been switching code to use @althonos' fastobo parser: https://pypi.org/project/fastobo/
On Thu, Jul 25, 2019 at 9:21 PM Zachary Juang notifications@github.com wrote:
I don't know how much interest there is still in this, but I managed to implement a more complete OBO file parser for one of my own projects. It still doesn't yet strictly check the cardinality of tags, but is able to parse all known tags down to its components using pyparsing. https://github.com/zachary822/oboparse
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dhimmel/obonet/issues/4?email_source=notifications&email_token=AAAMMOJMYBPUMDPWDI7BBWTQBJ3VZA5CNFSM4DFJXTTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23OOQQ#issuecomment-515303234, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOJ7TPOEFN3MYRPQ6HLQBJ3VZANCNFSM4DFJXTTA .
https://github.com/dhimmel/obonet/commit/e22bf7b40763a2cc810d4caee68e6871f5f3820e mentions the pronto_to_multidigraph
function in the nxontology package, which uses pronto, which in turn uses fastobo. This stack is nice since it adds support for reading OWL and OBO Graphs JSON files as well as OBO files.
Anyways, obonet
seems to work well for reading OBO files and is still probably the simplest solution for generating a networkx.MultiDiGraph
from a .obo
file. I'm not adding any major new features, but performing light maintenance to keep it up-to-date with Python. Users seem to like it. So closing this issue since major changes to obonet are unlikely.
@dhimmel
I took the liberty of seeing what it would take to implement
obonet
with my obo package:The lookup I wanted to do checking if X is_a "sequence_feature" in SO. Ignoring for a minute that there are other relationships than "is_a" (as @cmungall pointed out), the lookup would be something like:
If it would be interesting to you to collaborate on the parser and data model, let me know. It would be good not to be the only maintainer of the parser and for there to be a package that makes use of it. On the other hand I also understand if you prefer to do your own thing.