OBOFoundry / OBO-Dashboard

Summary Dashboard for Open Biological and Biomedical Ontologies
http://dashboard.obofoundry.org/
20 stars 3 forks source link

Reduce overuse of ROBOT commands for efficiency #131

Open anitacaron opened 2 months ago

anitacaron commented 2 months ago

There's a function to generate a ROBOT command that has many commands in a chain, which may be unnecessary. It can be time-consuming and memory-consuming for large ontologies, e.g., NCBITaxon, PCL, or NCIT.

The chain commands are:

  1. merge
  2. measure
  3. remove (to create a base file, but in some cases, the input is already a base file)
  4. measure (on the base file, which the result is not being used)
  5. merge (output a base file, which can be the same as the input)

I would suggest removing, at least the second measure.

To illustrate, this is the current ROBOT command for NCBITaxon:

robot merge -i build/ontologies/ncbitaxon-raw.owl \
measure --prefix 'NCBITAXONALT: http://purl.obolibrary.org/obo/ncbitaxon#' \
--prefix 'COVOC: http://purl.obolibrary.org/obo/COVOC_' \
--prefix 'CIDO: http://purl.obolibrary.org/obo/CIDO_' \
--prefix 'dbpedia: http://dbpedia.org/resource/' \
--prefix 'EFO: http://www.ebi.ac.uk/efo/EFO_' \
--prefix 'ONTONEO: http://purl.bioontology.org/OntONeo/ONTONEO_' \
--metrics extended-reasoner -f yaml -o build/ontologies/ncbitaxon-metrics.yml \
remove --base-iri http://purl.obolibrary.org/obo/NCBITAXON_ \
--base-iri http://purl.obolibrary.org/obo/ncbitaxon# \
--base-iri http://purl.obolibrary.org/obo/NCBITaxon_ \
--axioms external --trim false -p false \
measure --prefix 'NCBITAXONALT: http://purl.obolibrary.org/obo/ncbitaxon#' \
--prefix 'COVOC: http://purl.obolibrary.org/obo/COVOC_' \
--prefix 'CIDO: http://purl.obolibrary.org/obo/CIDO_' \
--prefix 'dbpedia: http://dbpedia.org/resource/' \
--prefix 'EFO: http://www.ebi.ac.uk/efo/EFO_ \
--prefix 'ONTONEO: http://purl.bioontology.org/OntONeo/ONTONEO_' \
--metrics extended-reasoner -f yaml -o build/ontologies/ncbitaxon-metrics.yml.base.yml \
merge --output build/ontologies/ncbitaxon.owl

https://github.com/OBOFoundry/OBO-Dashboard/blob/e39cde99db31b0b32ac1a064f12286eb19da1143/util/lib.py#L343-L377

matentzn commented 2 months ago

measure (on the base file, which the result is not being used)

I am wondering if this is a good thing - that it is not being used. Does it mean the file metrics we report in the dashboard all use the whole ontology?

anitacaron commented 2 months ago

Does it mean the file metrics we report in the dashboard all use the whole ontology?

Yes, only in cases where the base was generated.

anitacaron commented 2 months ago

The simple solution would be to change the order of the chain.

  1. merge
  2. remove (create the base file if make_base is true)
  3. measure
  4. merge
matentzn commented 2 months ago

If the remove command is not run when the base file is available, then ok, I guess we can do that. Its a bit weird for some ontologies that dont have a base, like application ontologies, to show metrics of the base (think OMO), but I am not opposed to try this, and see how it looks!