RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
39 stars 8 forks source link

Error in Unichem Extraction Script #293

Closed ecwood closed 1 year ago

ecwood commented 1 year ago

As part of testing for #291, I found that extract-unichem.sh fails. Here's the log file:

+ set -o nounset -o pipefail -o errexit
+ [[ '' == \-\-\h\e\l\p ]]
+ [[ '' == \-\h ]]
+ echo '================= starting extract-unichem.sh ================='
================= starting extract-unichem.sh =================
+ date
Fri Jun 23 15:52:58 UTC 2023
++ dirname extract-unichem.sh
+ config_dir=.
+ source ./master-config.shinc
++ '[' -z ']'
++ test_suffix=
++ BUILD_DIR=/home/ubuntu/kg2-build
++ VENV_DIR=/home/ubuntu/kg2-venv
++ CODE_DIR=/home/ubuntu/kg2-code
++ umls_dir=/home/ubuntu/kg2-build/umls
++ umls_dest_dir=/home/ubuntu/kg2-build/umls/META
++ s3_region=us-west-2
++ s3_bucket=rtx-kg2
++ s3_bucket_public=rtx-kg2-public
++ s3_bucket_versioned=rtx-kg2-versioned
++ s3_cp_cmd='aws s3 cp --no-progress --region us-west-2'
++ mysql_conf=/home/ubuntu/kg2-build/mysql-config.conf
++ curl_get='curl -s -L -f'
++ curies_to_categories_file=/home/ubuntu/kg2-code/curies-to-categories.yaml
++ curies_to_urls_file=/home/ubuntu/kg2-code/curies-to-urls-map.yaml
++ predicate_mapping_file=/home/ubuntu/kg2-code/predicate-remap.yaml
++ infores_mapping_file=/home/ubuntu/kg2-code/kg2-provided-by-curie-to-infores-curie.yaml
++ ont_load_inventory_file=/home/ubuntu/kg2-code/ont-load-inventory.yaml
++ umls2rdf_config_master=/home/ubuntu/kg2-code/umls2rdf-umls.conf
++ rtx_config_file=RTXConfiguration-config.json
++ biolink_model_version=3.1.2
+ output_tsv_file=/home/ubuntu/kg2-build/unichem/unichem-mappings.tsv
+ unichem_dir=/home/ubuntu/kg2-build/unichem
++ dirname /home/ubuntu/kg2-build/unichem/unichem-mappings.tsv
+ unichem_output_dir=/home/ubuntu/kg2-build/unichem
+ unichem_ver=385
+ unichem_ftp_site=ftp://ftp.ebi.ac.uk/pub/databases/chembl/UniChem/data
+ rm -r -f /home/ubuntu/kg2-build/unichem
+ mkdir -p /home/ubuntu/kg2-build/unichem
+ mkdir -p /home/ubuntu/kg2-build/unichem
+ curl -s -L -f ftp://ftp.ebi.ac.uk/pub/databases/chembl/UniChem/data/oracleDumps/UDRI385/UC_XREF.txt.gz -o /home/ubuntu/kg2-build/unichem/UC_XREF.txt.gz
ecwood commented 1 year ago

The data seems to now be stored here.

ecwood commented 1 year ago

Correction, the data seems to be here now, for the form we used previously.

ecwood commented 1 year ago

One problem with the new structure of UniChem is that it is no longer versioned the same way.

ecwood commented 1 year ago

With the changes from 817f37f and 965baf6, the extraction script is now running to completion. I am going to mark this issue for verification.

ecwood commented 1 year ago

I am closing this issue because the code worked in KG2.8.4pre's build.