Closed ecwood closed 11 months ago
This error occurred because there is no semmedVER43_2023_R_WHOLEDB.sql.gz
in the S3 bucket, only a semmedVER43_2021_R_WHOLEDB.sql.gz
. We might want to document that to update SemMedDB, you have to download a newer copy, since it can't auto download. While investigating this, I discovered that the download (which is here) has been separated into several parts. This will be a larger task than I expected now, because we will have to separately load in each table.
We can't import the ENTITY
table. It is huge (43G compressed, 248G in MySQL). It causes the instance to run out of disk space, even when I was deleting everything we didn't need. We don't need it though, as far as @saramsey and I could tell.
+ /home/ubuntu/kg2-venv/bin/python3 /home/ubuntu/kg2-code/semmeddb_mysql_to_tuple_list_json.py /home/ubuntu/kg2-build/mysql-config.conf semmeddb VER43 2023 /home/ubuntu/kg2-build/kg2-semmeddb-tuplelist.json
/home/ubuntu/kg2-venv/lib/python3.7/site-packages/rdflib_jsonld/__init__.py:12: DeprecationWarning: The rdflib-jsonld package has been integrated into rdflib as of rdflib==6.0.0. Please remove rdflib-jsonld from your project's dependencies.
DeprecationWarning,
Traceback (most recent call last):
File "/home/ubuntu/kg
During the build (#312), this error occurred in extract-semmeddb.sh
:
+ mkdir -p /home/ubuntu/kg2-build/semmeddb
++ /home/ubuntu/kg2-code/get-system-memory-gb.sh
+ mem_gb=374
+ aws s3 cp --no-progress --region us-west-2 s3://rtx-kg2/semmedVER43_2023_R_WHOLEDB.tar.gz /home/ubuntu/kg2-build/semmeddb/
download: s3://rtx-kg2/semmedVER43_2023_R_WHOLEDB.tar.gz to kg2-build/semmeddb/semmedVER43_2023_R_WHOLEDB.tar.gz
+ tar -xf /home/ubuntu/kg2-build/semmeddb/semmedVER43_2023_R_WHOLEDB.tar.gz
+ mysql --defaults-extra-file=/home/ubuntu/kg2-build/mysql-config.conf -e 'DROP DATABASE IF EXISTS semmeddb'
+ mysql --defaults-extra-file=/home/ubuntu/kg2-build/mysql-config.conf -e 'CREATE DATABASE IF NOT EXISTS semmeddb CHARACTER SET utf8 COLLATE utf8_unicode_ci'
+ zcat /home/ubuntu/kg2-build/semmeddb/semmedVER43_2023_R_CITATIONS.sql.gz
+ mysql --defaults-extra-file=/home/ubuntu/kg2-build/mysql-config.conf --database=semmeddb
gzip: /home/ubuntu/kg2-build/semmeddb/semmedVER43_2023_R_CITATIONS.sql.gz: No such file or directory
I am closing this issue because the code worked in KG2.8.4pre
's build.
As part of testing for https://github.com/RTXteam/RTX-KG2/issues/291, I found that extract-semmeddb.sh fails. Here's the log file: