biopragmatics / pyobo

📛 A Python package for using ontologies, terminologies, and biomedical nomenclatures
https://pyobo.readthedocs.io
MIT License
61 stars 14 forks source link

Adding other ontolo-dbs #194

Open ialarmedalien opened 1 week ago

ialarmedalien commented 1 week ago

I'm interested in potentially using pyobo to convert some other databases into ontologies after coming across https://biopragmatics.github.io/obo-db-ingest/ in my journeys around the ontolo-sphere. Are there any particular criteria that you had for picking the databases that you did (presumably just databases that you needed), and were there any dbs that you rejected? Did you put together any documentation about the project, particularly on decisions made when translating db entities into ontology terms?

I'm interested in "ontologising" MetaCyc and possibly others (in addition to some that you have already done - EC, KEGG, Reactome) so staying as consistent with the OBOification that has already been done would be best.

cthoyt commented 1 week ago

Most of what's in here is based on need/interest. I'm happy to accept PRs for new sources, you can check inside https://github.com/biopragmatics/pyobo/tree/main/src/pyobo/sources for examples on how this is done for other resources. At the moment, there isn't a very good contribution guide, but if you're keen on getting started, I can prepare some material for you.

WRT how to actually go about choosing an appropriate ontologization, this is a very difficult kind of domain knowledge to communicate. I'll have to think about how I would go about documenting this, again I refer you to read up on what exists already and feel free to open as many issues on this repo with questions as you have!

I'll mostly reject anything where conversion can't be 100% automated. Here are a few examples of larger chunks of code I spun out of PyOBO to automate getting more tricky resources: https://github.com/cthoyt/drugbank-downloader, https://github.com/cthoyt/umls_downloader, and https://github.com/cthoyt/chembl-downloader.

If you have a wish-list, please open up a different issue on this repo for each where we can have threaded discussions about them.

ialarmedalien commented 1 week ago

Thanks for the quick response and all the information. I used to work on GO so I'm familiar with ontologies, and I will pick through the sources you mentioned to see what would be appropriate for the data I'm interested in.

Are you on the obo-community slack, and if so, is there an appropriate channel there for discussing these topics? It's a bit easier than communicating via GitHub.

cthoyt commented 1 week ago

sure, I'm there! we have a #databases-as-ontologies channel

but keep in mind that it's a free slack and all discussions disappear after 90 days, and it's better for the sake of an open source project to use the issue tracker for discussion and planning