Closed matentzn closed 2 years ago
This situation is why I advocate merging all imports before doing the extraction in here: https://github.com/balhoff/ultimate-ontology-makefile
But then we don't have our separate "go_import", "pr_import", etc. Which I am fine with, but it's a bigger change to old workflows. This kind of goes along with publishing two versions: base file, and fully merged file.
Yeah I moved away from that solution because of the memory consumption. Once the set of all imports gets too large, travis jobs start failing, and even desktop computers run out of memory..
So, Jim is proposing a web service based solution for this problem (this would, at least, take care of memory and storage limitations of travis). This leads me to one question I have been meaning to ask for a while; @cmungall do we have resources to deploy a variety of reasoning services for OBO ontologies somewhere in your infrastructure? Something like web services for owlery deployments of main ontologies (this is more than OLS: this is about being able to have DL query endpoints, module extraction services etc).
Jims idea to this here issue is to deploy a service that allows you to extract a module from an arbitrary union of ontologies. Obviously we would need the usual; fall backs, load balancing etc. But I think its worth a thought!
How will this work with the need to pin releases to versionIRIs? We'd need a triplestore with all versions loaded. If we just want the most recent one then may as well use ontobee (and in fact why not just use ontofox)
Yeah, you could not pin a release to a version, thats true. I mean, you could have a config file that took care of that of course, for the web service i mean..
In general, can you see any other way to solve this problem? Some kind of smart ordering of imports goals such that the dependants are extracted right before the dependends, and the seed.txt regenerated each time? Or any other idea? Or simply run twice :/
Working off of a triplestore will require implementing the SLME algorithm over SPARQL. I think that would be cool, but quicker would be to load up ontologies into OWL API with lots of memory.
On the other hand I regularly consider managing a triplestore containing all historical versions of OBO ontologies.
Am I right that Ontofox does MIREOT but not SLME? Is MIREOT sufficient?
Is twice guaranteed enough? We have some reciprocal dependencies so to guarantee would you not need to iterate until saturation?
Re: travis. Are you saying travis can't handle building the imports, or that a merged imports causes issues even when reasoning? We could simply not have travis make imports and have this be the job of an ontology release manager.
Will think about all of the above tomorrow. Do you think people would be cool with a one module solution? All imports one module?
I wonder if memory issues are caused by annotations and axiom annotations. What about:
1 merge all base logical axioms in ontologies of interest 2 SLME to get all classes and logical axioms 3 Re-SLME on each mirrored ontology using 2 as seed
Maybe this is what you meant by do it twice?
Also, which ontologies are the memory hogs? I'm guessing:
We already release a taxslim which is good for many purposes. I've asked pr before for a species-neutral level subset, I don't think any ontology ever needs the part that duplicates uniprot or below
I am just worried that step 1 and 2 will take too long and cause memory exception if huge non-base ontologies are in the mix. I will try it a bit next week.
@balhoff What do you think of chris's suggestion?
Or do you still favour the merge all, extract one approach and just assume the client has 8GB main memory to do this?
I am working currently on this revised system for dynamic imports. It is complex, but its a big thorn in my eyes, as it has been broken for quite a while now (the last ODK thorn, the rest is mostly cosmetics).
The idea is this:
mirror/go_label.owl
is the set of all label triples in GO {GO:001 rdfs:label "X process"; ... }dosdp-dict.owl
is the merge of all mirror/x_label.owl. This file is used to generate DOSDP labels when using dosdp-generate. Note that in order to use this for matching as well, we need to extend it to subclass of + labels, which is considerably more expensive to do as it requires reasoning.pre-seed.txt
is extracted from definitions.owl
, o-edit.owl
, and any component files (remember, components are files that belong formally to the edit file but are managed in separate artefacts, like maxo-obs.owl
).mirror/go_logical.owl
is a subset of GO that contains only logical axioms. This ensures that if we do a module extraction, the memory footprint is not too large.mirror_logical_merged.owl
merges all mirror/x_logical.owl
into one.mirror_logical_module.owl
is extracted from mirror_logical_merged.owl
, using the seed pre-seed.txt
.seed.txt
is extracted from mirror_logical_module.owl
, which now contains absolutely all terms we need to build the modules.imports/go_import.owl
.This looks complex, but I think its sound. Please review this @cmungall @dosumis @balhoff as I want to implement this asap.
@matentzn can we discuss this on an ODK call?
I tried it a few times.. we can try to discuss it again :P it really needs to be solved - without external dependencies I think :(
We have been playing with the idea of using a merged ontology to extract modules from, but there are worries about losing fine-grained control, like avoiding pulling from a "current" version that broke something in my own ontology. We decided now that after all, we should keep a "ontology by ontology" workflow, and maybe even boost that to something more in the direction of maven.. I will implement the union extract technique after all - using the union. We will solve the CHEBI, PRO, NCBITAXON issue seperately.
This is now addressed since 1.2.32 with the new BASE pipeline. Yay.
When using base releases, seeding is currently incomplete. Example O imports GO and PR GO:1 belongs to GO PR uses GO:1
Starting point: O is empty GO is extracted first O does not contain GO:1 PR is extracted: O contains GO:1, but since GO is already extracted, axiom dependencies are missing.
I believe the problem can only be remedied by running the imports pipeline twice; or does anyone have a better idea? @cmungall @balhoff