althonos / pronto

A Python frontend to (Open Biomedical) Ontologies.
https://pronto.readthedocs.io
MIT License
228 stars 47 forks source link

Option to cache ontologies #62

Open cthoyt opened 4 years ago

cthoyt commented 4 years ago

I've been debugging the CHIRO (CHEBI Integrated Role Ontology) OBO export, and it had a few issues. First, I had to manually add some Typedef stanzas for its ad-hoc relations. Second, I had to switch the imports from its slimmed versions to the originals since the slim versions were missing several entities.

This lead me to the problem that it has to download each ontology file each time, and this takes a loooong time. Therefore, I'd like to request an option to cache OBO files (either the source .obo or a pre-compiled version as a pickle or OBO JSON)

I understand there could be problems to keeping the caches up-to-date, but maybe there's a simple way to add a dictionary argument to Ontology.__init__ so I can specify where I have my own copies like

from pronto import Ontology

Ontology.from_obo_library('chiro.obo', cache_files={
    'http://purl.obolibrary.org/obo/chiro/imports/chebi_import.owl': '/Users/cthoyt/obo/chebi.owl',
    'http://purl.obolibrary.org/obo/chiro/imports/envo_import.owl' : '/Users/cthoyt/obo/envo.owl',
    ...
})

Or alternatively, maybe you have an idea that could take care of this kind of caching for me so I don't have to look into the imports of the OBO file specifically.

althonos commented 4 years ago

Hi @cthoyt ,

you currently have a (hacky) workaround if you replace import: http://purl.obolibrary.org/obo/chiro/imports/chebi_import.owl with import: chebi_import.owl in the source OBO; then pronto will try to use a local file named chebi_import.owl.

I could add something to provide a source as a interface (like an SourceProvider interface), it would indeed be a better way to let the user implement it, plus the current code could benefit from refactoring.