Closed jseager7 closed 1 year ago
First of all: @jseager7 GREAT to hear from you!
Secondly:
mirror_retry_download
and mirror_max_time_download
in the imports:
section (not on a per ontology basis, for all ontologies at once).Let me know if these help!
@matentzn Thanks for the suggestions. Unfortunately, none of them have solved the problem.
I tried adding the ChEBI slim to phipo-odk.yaml but build.sh
just tried to download the 670M file again.
My connection doesn't seem flaky since the data is being transferred at a steady rate, I presume it's just not fast enough to finish before the timeout.
I tried setting mirror_max_time_download
to 600 and build.sh
finished without timing out, but then it timed out on prepare_release.sh
instead. See below for the console log.
if [ true = true ] && [ true = true ]; then curl -L http://purl.obolibrary.org/obo/chebi.owl --create-dirs -o mirror/chebi.owl --retry 4 --max-time 200 &&\
robot --catalog catalog-v001.xml convert -i mirror/chebi.owl -o mirror-chebi.tmp.owl &&\
mv mirror-chebi.tmp.owl tmp/mirror-chebi.owl; fi
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 328 0 328 0 0 1426 0 --:--:-- --:--:-- --:--:-- 1426
75 670M 75 508M 0 0 2601k 0 0:04:24 0:03:20 0:01:04 2649k
curl: (28) Operation timed out after 199777 milliseconds with 532725848 out of 703555307 bytes received
Warning: Problem : timeout. Will retry in 1 seconds. 4 retries left.
Throwing away 532725848 bytes
I still have some leftover stuff in phipo.Makefile
that was probably trying to solve problems with ChEBI. Maybe this is now causing problems:
imports/chebi_import.owl: mirror/chebi.owl imports/chebi_terms_combined.txt
if [ $(IMP) = true ]; then $(ROBOT) extract -i $< -T imports/chebi_terms_combined.txt --force true --method BOT \
annotate --ontology-iri $(ONTBASE)/$@ $(ANNOTATE_ONTOLOGY_VERSION) --output $@.tmp.owl && mv $@.tmp.owl $@; fi
.PRECIOUS: imports/chebi_import.owl
I don't really want to keep running prepare_release.sh
since it seems to be downloading a mirror of every ontology every time, which is wasting time and data (possibly related: https://github.com/INCATools/ontology-development-kit/issues/863). It's also throwing away hundreds of megabytes of downloaded data every time ChEBI times out.
Did you run the update_repo
workflow? Can you point me to a PR?
I updated to ODK v1.4.1 today with the update_repo
workflow. I think I did this before trying the release.
There's no PR yet, but you can check the release
branch on our repo. Here's the diff:
If you want me to make a PR so we can collaborate on fixing this, I'm happy to do that.
Yes better a draft PR
@jseager7 for CHEBI in particular you could use http://purl.obolibrary.org/obo/chebi.owl.gz
, it's only 46 MB.
I ended up fixing this by using the slim version of ChEBI. The reason the fix didn't work at first was because I forgot to run the command to update the Makefile.
Thanks @balhoff for the suggestion about using compressed versions of the ontologies. I might use this for other ontologies.
To use the compressed versions, do you just add the PURLs to the ODK YAML file as a mirror_from
property? For example:
- id: chebi
make_base: TRUE
mirror_from: http://purl.obolibrary.org/obo/chebi.owl.gz
When running
build.sh
I've noticed that creating mirrors of some very large ontologies fails because cURL can't download the ontology file before hitting the timeout limit (200 seconds).Here's an example of the error:
This mainly affects the ChEBI OWL file, which is huge (~670 MB). I can download about 630 MB before timing out.
I tried to override the
curl
command inphipo.MAKEFILE
to set a higher timeout limit (--max-time 600
), by copying these lines to the makefile:But this doesn't seem to have any effect. The console still reports that approximately 3 minutes are left once the download starts:
Is there anything I'm missing?