EBISPOT / scatlas_ontology

SCA_Ontology
4 stars 2 forks source link

Errors when running the scatlas generation pipeline #31

Open pcm32 opened 2 years ago

pcm32 commented 2 years ago

I have setup the scatlas generation pipeline from a singularity container (see PR #30, based on the same container used from docker, stemming from your current master, please let me know if I should be basing on a different branch) so that it can run on the cluster from our internal CI. However, on running:

export USE_SINGULARITY=true
export TAG=v1.3.1

bash run_release.sh

I'm getting some errors (look at the bottom mostly):

==> log.err <==
Makefile:429: warning: overriding recipe for target 'mirror-ordo'
Makefile:415: warning: ignoring old recipe for target 'mirror-ordo'
scatlas.Makefile:11: warning: overriding recipe for target 'tmp/seed.txt'
Makefile:232: warning: ignoring old recipe for target 'tmp/seed.txt'
scatlas.Makefile:104: warning: overriding recipe for target 'scatlas-full.owl'
Makefile:561: warning: ignoring old recipe for target 'scatlas-full.owl'
scatlas.Makefile:153: warning: overriding recipe for target 'update_repo'
Makefile:607: warning: ignoring old recipe for target 'update_repo'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   336  100   336    0     0   1662      0 --:--:-- --:--:-- --:--:--  1671

  0 10.1M    0 51042    0     0  69091      0  0:02:34 --:--:--  0:02:34 69091
100 10.1M  100 10.1M    0     0   9.9M      0  0:00:01  0:00:01 --:--:-- 36.6M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   368  100   368    0     0   2106      0 --:--:-- --:--:-- --:--:--  2114

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  6 72.7M    6 5191k    0     0  1633k      0  0:00:45  0:00:03  0:00:42 1732k
100 72.7M  100 72.7M    0     0  20.9M      0  0:00:03  0:00:03 --:--:-- 22.0M

==> log.out <==
touch  components/bto_seed_extract.sparql  components/cl_seed_extract.sparql  components/chebi_seed_extract.sparql  components/go_seed_extract.sparql  components/peco_seed_extract.sparql  components/pato_seed_extract.sparql  components/fbbt_seed_extract.sparql  components/to_seed_extract.sparql  components/po_seed_extract.sparql  components/wbls_seed_extract.sparql  components/ncit_seed_extract.sparql  components/uberon_seed_extract.sparql  components/rs_seed_extract.sparql  components/fbdv_seed_extract.sparql  components/omit_seed_extract.sparql  components/ordo_seed_extract.sparql  components/obi_seed_extract.sparql  components/ordo_seed_extract.sparql  components/uberon-bridge-to-fbbt_seed_extract.sparql  components/ncbitaxon_seed_extract.sparql  components/efo_seed_extract.sparql  components/cl-bridge-to-fbbt_seed_extract.sparql   &&\
touch  components/bto_simple_seed.txt  components/cl_simple_seed.txt  components/chebi_simple_seed.txt  components/go_simple_seed.txt  components/peco_simple_seed.txt  components/pato_simple_seed.txt  components/fbbt_simple_seed.txt  components/to_simple_seed.txt  components/po_simple_seed.txt  components/wbls_simple_seed.txt  components/ncit_simple_seed.txt  components/uberon_simple_seed.txt  components/rs_simple_seed.txt  components/fbdv_simple_seed.txt  components/omit_simple_seed.txt  components/ordo_simple_seed.txt  components/obi_simple_seed.txt  components/ordo_simple_seed.txt  components/uberon-bridge-to-fbbt_simple_seed.txt  components/ncbitaxon_simple_seed.txt  components/efo_simple_seed.txt  components/cl-bridge-to-fbbt_simple_seed.txt
mkdir -p tmp
if [ true  = true ] && [ true  = true ]; then curl -L http://purl.obolibrary.org/obo/bto.owl --create-dirs -o mirror/bto.owl --retry 4 --max-time 200 && robot --catalog catalog-v001.xml convert -i mirror/bto.owl -o mirror-bto.tmp.owl && mv mirror-bto.tmp.owl tmp/mirror-bto.owl; fi
mkdir -p mirror
if [ true  = true ] && [ true  = true ]; then if cmp -s tmp/mirror-bto.owl mirror/bto.owl ; then echo "Mirror identical, ignoring."; else echo "Mirrors different, updating." && cp tmp/mirror-bto.owl mirror/bto.owl; fi; fi
Mirror identical, ignoring.
cat ../curation/scatlas_seed_table.tsv | cut -f3 -s | sed 's/\r//' | awk '{$1=$1};1' | sed '/^\(http\)/!d' | tr \| \\n  | sort | uniq > ../curation/scatlas_seed.txt
if [ true  = true ] && [ true  = true ]; then curl -L http://purl.obolibrary.org/obo/fbbt.owl --create-dirs -o mirror/fbbt.owl --retry 4 --max-time 200 && robot --catalog catalog-v001.xml convert -i mirror/fbbt.owl -o mirror-fbbt.tmp.owl && mv mirror-fbbt.tmp.owl tmp/mirror-fbbt.owl; fi
if [ true  = true ] && [ true  = true ]; then if cmp -s tmp/mirror-fbbt.owl mirror/fbbt.owl ; then echo "Mirror identical, ignoring."; else echo "Mirrors different, updating." && cp tmp/mirror-fbbt.owl mirror/fbbt.owl; fi; fi
Mirror identical, ignoring.
if [ true  = true ] && [ true  = true ]; then curl -L http://purl.obolibrary.org/obo/uberon.owl --create-dirs -o mirror/uberon.owl --retry 4 --max-time 200 && robot --catalog catalog-v001.xml convert -i mirror/uberon.owl -o mirror-uberon.tmp.owl && mv mirror-uberon.tmp.owl tmp/mirror-uberon.owl; fi

==> log.err <==
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   343  100   343    0     0   1971      0 --:--:-- --:--:-- --:--:--  1982

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 16 64.3M   16 10.4M    0     0  5472k      0  0:00:12  0:00:01  0:00:11 10.0M
 37 64.3M   37 23.8M    0     0  8262k      0  0:00:07  0:00:02  0:00:05 11.6M
 58 64.3M   58 37.5M    0     0  9734k      0  0:00:06  0:00:03  0:00:03 12.3M
 80 64.3M   80 51.9M    0     0  10.4M      0  0:00:06  0:00:04  0:00:02 12.8M
 96 64.3M   96 62.3M    0     0  10.4M      0  0:00:06  0:00:05  0:00:01 12.2M
100 64.3M  100 64.3M    0     0  10.3M      0  0:00:06  0:00:06 --:--:-- 12.6M

==> log.out <==
if [ true  = true ] && [ true  = true ]; then if cmp -s tmp/mirror-uberon.owl mirror/uberon.owl ; then echo "Mirror identical, ignoring."; else echo "Mirrors different, updating." && cp tmp/mirror-uberon.owl mirror/uberon.owl; fi; fi
Mirrors different, updating.
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml convert -I http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-fbbt.owl -o mirror-uberon-bridge-to-fbbt.tmp.owl && mv mirror-uberon-bridge-to-fbbt.tmp.owl tmp/mirror-uberon-bridge-to-fbbt.owl; fi
if [ true  = true ] && [ true  = true ]; then if cmp -s tmp/mirror-uberon-bridge-to-fbbt.owl mirror/uberon-bridge-to-fbbt.owl ; then echo "Mirror identical, ignoring."; else echo "Mirrors different, updating." && cp tmp/mirror-uberon-bridge-to-fbbt.owl mirror/uberon-bridge-to-fbbt.owl; fi; fi
Mirrors different, updating.
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml convert -I http://purl.obolibrary.org/obo/uberon/bridge/cl-bridge-to-fbbt.owl -o mirror-cl-bridge-to-fbbt.tmp.owl && mv mirror-cl-bridge-to-fbbt.tmp.owl tmp/mirror-cl-bridge-to-fbbt.owl; fi
if [ true  = true ] && [ true  = true ]; then if cmp -s tmp/mirror-cl-bridge-to-fbbt.owl mirror/cl-bridge-to-fbbt.owl ; then echo "Mirror identical, ignoring."; else echo "Mirrors different, updating." && cp tmp/mirror-cl-bridge-to-fbbt.owl mirror/cl-bridge-to-fbbt.owl; fi; fi
Mirrors different, updating.
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml convert -I http://purl.obolibrary.org/obo/cl/cl-base.owl -o mirror-cl.tmp.owl && mv mirror-cl.tmp.owl tmp/mirror-cl.owl; fi
if [ true  = true ] && [ true  = true ]; then if cmp -s tmp/mirror-cl.owl mirror/cl.owl ; then echo "Mirror identical, ignoring."; else echo "Mirrors different, updating." && cp tmp/mirror-cl.owl mirror/cl.owl; fi; fi
Mirrors different, updating.
if [ true  = true ]; then robot --catalog catalog-v001.xml merge  -i mirror/fbbt.owl  -i mirror/uberon.owl  -i mirror/uberon-bridge-to-fbbt.owl  -i mirror/cl-bridge-to-fbbt.owl  -i mirror/cl.owl \
  remove --axioms disjoint  -o imports/fbbt_merged.owl; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml extract -i imports/fbbt_merged.owl -T imports/fbbt_terms_combined.txt --force true --method BOT \
    query --update ../sparql/inject-subset-declaration.ru \
    annotate --ontology-iri http://purl.obolibrary.org/obo/scatlas/imports/fbbt_import.owl --version-iri http://purl.obolibrary.org/obo/scatlas/releases/2022-07-21/imports/fbbt_import.owl --output imports/fbbt_import.owl.tmp.owl && mv imports/fbbt_import.owl.tmp.owl imports/fbbt_import.owl; fi
sh ../scripts/generate_sparql_subclass_query.sh tmp/seed.txt components/fbbt_seed_extract.sparql
robot --catalog catalog-v001.xml query --input imports/fbbt_import.owl --query components/fbbt_seed_extract.sparql components/fbbt_simple_seed.txt.tmp.txt && \
cat tmp/seed.txt ../curation/scatlas_relations.txt components/fbbt_simple_seed.txt.tmp.txt | sort | uniq > components/fbbt_simple_seed.txt  && rm components/fbbt_simple_seed.txt.tmp.txt
#sed -i '/BFO_0000001/d' components/fbbt_simple_seed.txt
#sed -i '/BFO_0000002/d' components/fbbt_simple_seed.txt
#sed -i '/BFO_0000003/d' components/fbbt_simple_seed.txt
#sed -i '/BTO_0000000/d' components/fbbt_simple_seed.txt
#sed -i '/UBERON_0000000/d' components/fbbt_simple_seed.txt
#sed -i '/Orphanet_183634/d' components/fbbt_simple_seed.txt
#sed -i '/Orphanet_208593/d' components/fbbt_simple_seed.txt
#sed -i '/orpha.*ObsoleteClass/d' components/fbbt_simple_seed.txt
#comm -2 -3 components/fbbt_simple_seed.txt ../curation/blacklist.txt > components/fbbt_simple_seed.txt
if [ true  = true ]; then robot --catalog catalog-v001.xml merge --input imports/fbbt_import.owl reason --reasoner ELK relax \
    remove --axioms equivalent \
    remove --axioms disjoint \
    remove --term-file ../curation/scatlas_relations.txt --select complement --select object-properties --trim true \
    relax \
    filter --term-file components/fbbt_simple_seed.txt --select "annotations ontology anonymous self" --trim true --signature true \
    reduce -r ELK \
    annotate --ontology-iri http://purl.obolibrary.org/obo/scatlas/components/fbbt.owl --version-iri http://purl.obolibrary.org/obo/scatlas/releases/2022-07-21/components/fbbt.owl --output components/fbbt.owl.tmp.owl && mv components/fbbt.owl.tmp.owl components/fbbt.owl; fi
if [ true  = true ]; then cat imports/cl_terms.txt | grep -v ^# | sort | uniq >  imports/cl_terms_combined.txt; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i mirror/cl.owl --update ../sparql/preprocess-module.ru \
    extract -T imports/cl_terms_combined.txt --force true --copy-ontology-annotations true --individuals include --method BOT \
    query --update ../sparql/inject-subset-declaration.ru --update ../sparql/postprocess-module.ru \
    annotate --ontology-iri http://purl.obolibrary.org/obo/scatlas/imports/cl_import.owl annotate -V http://purl.obolibrary.org/obo/scatlas/releases/2022-07-21/imports/cl_import.owl --annotation owl:versionInfo 2022-07-21 --output imports/cl_import.owl.tmp.owl && mv imports/cl_import.owl.tmp.owl imports/cl_import.owl; fi
sh ../scripts/generate_sparql_subclass_query.sh tmp/seed.txt components/cl_seed_extract.sparql
robot --catalog catalog-v001.xml query --input imports/cl_import.owl --query components/cl_seed_extract.sparql components/cl_simple_seed.txt.tmp.txt && \
cat tmp/seed.txt ../curation/scatlas_relations.txt components/cl_simple_seed.txt.tmp.txt | sort | uniq > components/cl_simple_seed.txt  && rm components/cl_simple_seed.txt.tmp.txt
#sed -i '/BFO_0000001/d' components/cl_simple_seed.txt
#sed -i '/BFO_0000002/d' components/cl_simple_seed.txt
#sed -i '/BFO_0000003/d' components/cl_simple_seed.txt
#sed -i '/BTO_0000000/d' components/cl_simple_seed.txt
#sed -i '/UBERON_0000000/d' components/cl_simple_seed.txt
#sed -i '/Orphanet_183634/d' components/cl_simple_seed.txt
#sed -i '/Orphanet_208593/d' components/cl_simple_seed.txt
#sed -i '/orpha.*ObsoleteClass/d' components/cl_simple_seed.txt
#comm -2 -3 components/cl_simple_seed.txt ../curation/blacklist.txt > components/cl_simple_seed.txt
robot --catalog catalog-v001.xml merge --input imports/cl_import.owl  \
    reason --reasoner ELK  \
    remove --axioms disjoint --trim false --preserve-structure false \
    remove --term-file ../curation/scatlas_relations.txt --select complement --select object-properties --trim true \
    relax \
    filter --term-file components/cl_simple_seed.txt --select "annotations ontology anonymous self" --trim true --signature true \
    annotate --ontology-iri http://purl.obolibrary.org/obo/scatlas/components/cl.owl --version-iri http://purl.obolibrary.org/obo/scatlas/releases/2022-07-21/components/cl.owl --output components/cl.owl.tmp.owl && mv components/cl.owl.tmp.owl components/cl.owl
robot --catalog catalog-v001.xml remove --input scatlas-edit.owl --select imports --trim false \
    merge   -i components/fbbt.owl  -i components/cl.owl -o tmp/merged-scatlas-edit.owl
robot --catalog catalog-v001.xml query -f csv -i tmp/merged-scatlas-edit.owl --query ../sparql/terms.sparql tmp/pre_seed.txt.tmp &&\
cat tmp/pre_seed.txt.tmp | sort | uniq >  tmp/pre_seed.txt
cp ../curation/scatlas_seed.txt tmp/seed.txt
echo 'http://www.ebi.ac.uk/efo/EFO_0000001' >> tmp/seed.txt
if [ true  = true ]; then cat tmp/seed.txt imports/bto_terms.txt | grep -v ^# | sort | uniq >  imports/bto_terms_combined.txt; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i mirror/bto.owl --update ../sparql/preprocess-module.ru \
    extract -T imports/bto_terms_combined.txt --force true --copy-ontology-annotations true --individuals include --method BOT \
    query --update ../sparql/inject-subset-declaration.ru --update ../sparql/postprocess-module.ru \
    annotate --ontology-iri http://purl.obolibrary.org/obo/scatlas/imports/bto_import.owl annotate -V http://purl.obolibrary.org/obo/scatlas/releases/2022-07-21/imports/bto_import.owl --annotation owl:versionInfo 2022-07-21 --output imports/bto_import.owl.tmp.owl && mv imports/bto_import.owl.tmp.owl imports/bto_import.owl; fi
sh ../scripts/generate_sparql_subclass_query.sh tmp/seed.txt components/bto_seed_extract.sparql
robot --catalog catalog-v001.xml query --input imports/bto_import.owl --query components/bto_seed_extract.sparql components/bto_simple_seed.txt.tmp.txt && \
cat tmp/seed.txt ../curation/scatlas_relations.txt components/bto_simple_seed.txt.tmp.txt | sort | uniq > components/bto_simple_seed.txt  && rm components/bto_simple_seed.txt.tmp.txt
#sed -i '/BFO_0000001/d' components/bto_simple_seed.txt
#sed -i '/BFO_0000002/d' components/bto_simple_seed.txt
#sed -i '/BFO_0000003/d' components/bto_simple_seed.txt
#sed -i '/BTO_0000000/d' components/bto_simple_seed.txt
#sed -i '/UBERON_0000000/d' components/bto_simple_seed.txt
#sed -i '/Orphanet_183634/d' components/bto_simple_seed.txt
#sed -i '/Orphanet_208593/d' components/bto_simple_seed.txt
#sed -i '/orpha.*ObsoleteClass/d' components/bto_simple_seed.txt
#comm -2 -3 components/bto_simple_seed.txt ../curation/blacklist.txt > components/bto_simple_seed.txt
robot --catalog catalog-v001.xml merge --input imports/bto_import.owl  \
    reason --reasoner ELK  \
    remove --axioms disjoint --trim false --preserve-structure false \
    remove --term-file ../curation/scatlas_relations.txt --select complement --select object-properties --trim true \
    relax \
    filter --term-file components/bto_simple_seed.txt --select "annotations ontology anonymous self" --trim true --signature true \
    annotate --ontology-iri http://purl.obolibrary.org/obo/scatlas/components/bto.owl --version-iri http://purl.obolibrary.org/obo/scatlas/releases/2022-07-21/components/bto.owl --output components/bto.owl.tmp.owl && mv components/bto.owl.tmp.owl components/bto.owl
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml convert -I http://purl.obolibrary.org/obo/chebi.owl.gz -o mirror-chebi.tmp.owl && mv mirror-chebi.tmp.owl tmp/mirror-chebi.owl; fi

==> log.err <==
make: Circular components/fbbt_seed_extract.sparql <- tmp/seed.txt dependency dropped.
sed: can't read tmp/seed.txt: No such file or directory
make: Circular components/fbbt_simple_seed.txt <- tmp/seed.txt dependency dropped.
cat: tmp/seed.txt: No such file or directory
make: Circular imports/cl_terms_combined.txt <- tmp/seed.txt dependency dropped.
make: Circular components/cl_seed_extract.sparql <- tmp/seed.txt dependency dropped.
sed: can't read tmp/seed.txt: No such file or directory
make: Circular components/cl_simple_seed.txt <- tmp/seed.txt dependency dropped.
cat: tmp/seed.txt: No such file or directory
make: *** [Makefile:324: mirror-chebi] Error 1
Job 2178892 had exit status EXIT, error code 2, check standard out (log.out) and error (log.err) .

==> log.out <==
org.semanticweb.owlapi.model.OWLRuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
Use the -vvv option to show the stack trace.
Use the --help option to see usage information.
rm imports/bto_terms_combined.txt imports/cl_terms_combined.txt

Could you advice on what could be the problem with regards to the tmp/seed.txt? or the ZLIB error? or is this expected? The process currently is coming out with an error code.

Pinging @gouttegd @matentzn as advised by @dosumis . Thanks!

pcm32 commented 2 years ago

I can also add that I ran this with 32 GB of RAM. Does it need more?

matentzn commented 2 years ago

No, it wont need more RAM. This error is the scrouge on my back across all ontologies. It happens due to the flakiness of the FTP servers that serve CHEBI at the EBI data center. There is nothing you can do other than trying again later.

But looking at the log file, there is an urgent need to review the customised pipeline - The circularity warnings need to be fixed ASAP.

pcm32 commented 2 years ago

Thanks @matentzn . If I would override this line:

https://github.com/EBISPOT/scatlas_ontology/blob/3ff99aa4342ca2c6dcda95421e0ba072df60ee8b/project.yaml#L35

for some local path to the ChEBI ontology, would that work to avoid the FTP in the middle? Thanks!

matentzn commented 2 years ago

I will try to find a more general solution for you, but cant be today. Does it have till Thursday?

It will take me longer now to describe how to do the workaround. This cycle also needs fixing!

pcm32 commented 2 years ago

Sure, Thu should be fine, thanks!

dosumis commented 2 years ago

Thanks Nico. We should plan a review and update of the pipeline with @anitacaron & @gouttegd at some point soon.

gouttegd commented 2 years ago

I believe the circular dependency is similar to a problem previously highlighted in the ODK.

The seed.txt target depends on $(SRCMERGED), which depends on both $(SRC) (the -edit file, which in SCAO is mostly empty as it only contains imports) and on $(OTHERSRC) (which contains the components fbbt.owl and cl.owl). So ultimately, the seed.txt file can only be created once the fbbt.owl and cl.owl components have been generated.

But the fbbt.owl component depends on fbbt_simple_seed.txt, which itself (as all %_simple_seed.txt files) depends on seed.txt. So now we need seed.txt to build fbbt.owl, which we need to build seed.txt. BOOM.

gouttegd commented 2 years ago

A quick and dirty workaround would be patch the standard Makefile to make seed.txt depend only on $(SRC) and not on $(OTHERSRC).

But I think a deeper review of the pipeline to streamline the dependencies would be a much better solution.

matentzn commented 2 years ago

@pcm32

Try the pipeline now again - with a bit of luck it will work even without you using any parameters.. In case chebi and/or ncbitaxon give you grief let me know and I will show you how to skip over them using a specific parameter.