RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
39 stars 8 forks source link

KG2.7.3 build #138

Closed saramsey closed 2 years ago

saramsey commented 3 years ago

Aiming to do a build KG2.7.3 the week of Aug. 30 - Sep. 3, to get the fix for #131 out as soon as possible.

saramsey commented 3 years ago

Hi @acevedol: we might want to hold off a bit to see if we can also get the fix for #141 in the 2.7.3 build, let's discuss

saramsey commented 3 years ago

Hi @acevedol just an FYI, I have committed what I hope is a fix for #141 in the issue-141 branch. I am doing some more thorough testing now. If everything looks good, I will push that commit upstream to the master branch so we can hopefully include that fix in the KG2.7.3 build.

saramsey commented 3 years ago

OK, I have pushed the fix for #141 to the master branch. I think it is ready for inclusion in the KG2.7.3 build.

acevedol commented 3 years ago

deleted kg2-code on buildkg2.rtx.ai and cloned it from the repo.

acevedol commented 3 years ago

From ~/kg2-build, ran source ~/kg2-venv/bin/activate python3 ~/kg2-code/validate_provided_by_to_infores_map_yaml.py ~/kg2-code/kg2-provided-by-curie-to-infores-curie.yaml ./infores-catalog.tsv deactivate

It ran with no output, and from https://github.com/RTXteam/RTX-KG2/issues/104#issuecomment-893052649, that should mean there were no errors

Screen Shot 2021-09-01 at 12 55 25 PM
acevedol commented 3 years ago

Ran a dry run with bash -x ~/kg2-code/build-kg2-snakemake.sh all -F -n The log file shows all 48 jobs, as expected.

acevedol commented 3 years ago

Running full build with bash -x ~/kg2-code/build-kg2-snakemake.sh all -F

acevedol commented 3 years ago

Build crashed at DrugCentral rule

Screen Shot 2021-09-02 at 8 39 24 AM
acevedol commented 3 years ago

Error in extract-drugcentral.log is Error: role "jjyang" does not exist

acevedol commented 3 years ago

I'm not sure if this is an actual problem yet, but checking kg2-build/build-kg2-snakemake.log shows an error with Rule Unichem, but the script is still running

Screen Shot 2021-09-07 at 4 14 34 PM
acevedol commented 3 years ago

Changes to extract-drugcentral and a little bit of command line correcting finished extract-drugcentral correctly. Error for jjyang came from the role being created out of order

acevedol commented 3 years ago

Another error

Screen Shot 2021-09-08 at 8 13 08 PM

Presumably due to the UniChem error since the drug central script was able to complete successfully

acevedol commented 3 years ago

Unichem error using curl -v

Screen Shot 2021-09-09 at 10 45 29 AM
acevedol commented 3 years ago

UniChem seems to be an access control problem. The script uses an anonymous user and is denied access. I'm digging in https://www.ebi.ac.uk for possible solutions.

acevedol commented 3 years ago

UDRI version 385 appears to be current. 375 no longer available

Screen Shot 2021-09-09 at 1 24 12 PM
acevedol commented 3 years ago

Error in build-multi-ont-kg.log

Screen Shot 2021-09-09 at 2 42 16 PM Screen Shot 2021-09-09 at 2 42 58 PM
acevedol commented 3 years ago

I am still stuck on the above error. The line /usr/bin/java -Xms2G -Xmx255683G -DentityExpansionLimit=4086000 -Djava.awt.headless=true -classpath /home/ubuntu/kg2-build/owltools owltools.cli.CommandLineInterface biolink-model.owl.ttl -o -f json /tmp/kg2-97rgzijn.json seems to be where it's stuck, but I can't find where to change the settings for the VM stack size

saramsey commented 3 years ago

OK, this portion of the shell command strongly indicates a bug:

/usr/bin/java -Xms2G -Xmx255683G

since 255683G is 255 Terabytes (!). I am checking on the cause of this bug now...

saramsey commented 3 years ago

I suspect my code changes to get-system-memory.sh for #137 caused this bug. Testing that hunch now...

saramsey commented 3 years ago

Commit 40d9a6a should fix the problem with build-multi-ont-kg.log

acevedol commented 3 years ago

Another error in Ontologies rule

Screen Shot 2021-09-14 at 7 02 28 PM
acevedol commented 3 years ago

The fix above for get-system-memory.sh did correct the problem. Thank you, Steve!

acevedol commented 3 years ago

Ontology exited with error Reading ontology file: foodon.owl; size: 6944.69 KiB /usr/bin/java -Xms2G -Xmx249G -DentityExpansionLimit=4086000 -Djava.awt.headless=true -classpath /home/ubuntu/kg2-build/owltools owltools.cli.CommandLineInterface foodon.owl -o -f json /tmp/kg2-_afrl46i.json 2021-09-15 02:38:43,998 ERROR (CommandRunner:4815) could not parse:foodon.owl org.semanticweb.owlapi.model.UnloadableImportException: Could not load imported ontology: <http://purl.obolibrary.org/obo/foodon/imports/dietary_supplement_import.owl> Cause: https://raw.githubusercontent.com/FoodOntology/foodon/master/imports/dietary_supplement_import.owl at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.makeLoadImportRequest(OWLOntologyManagerImpl.java:1870) at org.semanticweb.owlapi.rdf.rdfxml.parser.TripleHandlers$TPImportsHandler.handleTriple(TripleHandlers.java:1537) at org.semanticweb.owlapi.rdf.rdfxml.parser.TripleHandlers$HandlerAccessor.handleStreaming(TripleHandlers.java:194) at org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer.statementWithResourceValue(OWLRDFConsumer.java:1545) at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.statementWithResourceValue(RDFParser.java:370) at org.semanticweb.owlapi.rdf.rdfxml.parser.EmptyPropertyElement.startElement(StartRDF.java:236) at org.semanticweb.owlapi.rdf.rdfxml.parser.PropertyElementList.startElement(StartRDF.java:658) at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.startElement(RDFParser.java:201) at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source) at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source) at org.apache.xerces.impl.dtd.XMLDTDValidator.emptyElement(Unknown Source) at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParser.parse(RDFParser.java:145) at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser.parse(RDFXMLParser.java:73) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:220) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1254) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1208) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1108) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1064) at owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:163) at owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:150) at owltools.io.ParserWrapper.parse(ParserWrapper.java:132) at owltools.cli.CommandRunner.runSingleIteration(CommandRunner.java:4803) at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:76) at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:68) at owltools.cli.CommandLineInterface.main(CommandLineInterface.java:12) Caused by: org.semanticweb.owlapi.io.OWLOntologyCreationIOException: https://raw.githubusercontent.com/FoodOntology/foodon/master/imports/dietary_supplement_import.owl at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:230) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1254) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1208) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1108) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadImports(OWLOntologyManagerImpl.java:1825) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.makeLoadImportRequest(OWLOntologyManagerImpl.java:1863) ... 33 more Caused by: java.io.FileNotFoundException: https://raw.githubusercontent.com/FoodOntology/foodon/master/imports/dietary_supplement_import.owl at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at java.base/sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1974) at java.base/sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1969) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1968) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520) at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStreamFromContentEncoding(AbstractOWLParser.java:179) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStream(AbstractOWLParser.java:141) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputSource(AbstractOWLParser.java:264) at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser.parse(RDFXMLParser.java:72) at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:220) ... 38 more Caused by: java.io.FileNotFoundException: https://raw.githubusercontent.com/FoodOntology/foodon/master/imports/dietary_supplement_import.owl at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1920) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520) at java.base/sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:3099) at java.base/java.net.URLConnection.getContentEncoding(URLConnection.java:530) at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getContentEncoding(HttpsURLConnectionImpl.java:406) at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStream(AbstractOWLParser.java:136) ... 41 more Traceback (most recent call last): File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 1362, in <module> save_pickle) File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 143, in make_kg2 save_pickle) File "/home/ubuntu/kg2-code/multi_ont_to_json_kg.py", line 67, in load_ont_file_return_ontology_and_metadata ontology = kg2_util.make_ontology_from_local_file(file_name, save_pickle=save_pickle) File "/home/ubuntu/RTX-KG2/kg2_util.py", line 783, in make_ontology_from_local_file check=True) File "/usr/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['owltools', 'foodon.owl', '-o', '-f', 'json', '/tmp/kg2-_afrl46i.json']' returned non-zero exit status 1.

acevedol commented 3 years ago

Error in Rule Simplify

relation curie is missing from the YAML config file: RO:0002470
There are relation curies missing from the yaml config file. Please add them and try again. Exiting.
acevedol commented 3 years ago

KG2 build completed. The complete log is at /home/ubuntu/.snakemake/log/2021-09-16T181654.939719.snakemake.log

acevedol commented 3 years ago

On build instance, kg2-version.txt shows 2.7.3 and tagged repo with KG2.7.3

acevedol commented 3 years ago

kg2-simplified-report.json grew from 48KB to 50 KB 4308 more edges 1525 more nodes

acevedol commented 3 years ago

I accidentally deleted the checklist at the top of this

acevedol commented 3 years ago

Installed new tsv files on kg2enpoint4.rtx.ai. The log file kg2-build/setup-kg2-neo4j.log ends with ======= script finished ======

acevedol commented 3 years ago

Updated CNAME for kg2endpoint4.rtx.ai to point to kg2endpoint-kg2-7-3.rtx.ai

acevedol commented 3 years ago

Made directory on rtxconfig@arax.ncats.io for KG2.7.3

acevedol commented 3 years ago

updated kg2c_config.json

Screen Shot 2021-09-17 at 10 12 39 PM
acevedol commented 3 years ago

Error running synonymizer. Appears similar to an error from the last build

Screen Shot 2021-09-17 at 10 20 21 PM
amykglen commented 3 years ago

ok, fix is pushed to the kg2integration branch (in the RTX repo). (was related to Finn's changes to configv2.json yesterday -- made some tweaks so the KG2c build shouldn't be so fragile in terms of config files anymore in https://github.com/RTXteam/RTX/commit/2509ca4314d67a1e5185ba2bcc18b02c0cc27fad)

also worth noting that this KG2c build should be done from the kg2integration branch - probably should add a step to the build checklist to touch base about which branch the KG2c build should be done from. :)

acevedol commented 3 years ago

Added "Check with Amy which branch to use for building kg2c" to checklist

acevedol commented 3 years ago

NodeSynonymizer build finished with 2021-09-18 20:43:23,810 INFO: Done building synonymizer.

acevedol commented 3 years ago

Checked arax.ncats.io to make sure synonymizer files are presen

Screen Shot 2021-09-18 at 1 50 35 PM

t

acevedol commented 3 years ago

Updated kg2c_config.json to build kg2c

Screen Shot 2021-09-18 at 1 48 40 PM
acevedol commented 3 years ago

BuildKG2C completed and the files on arax.ncats.io are slightly larger than in KG2.7.2C

Screen Shot 2021-09-19 at 9 13 29 AM
acevedol commented 3 years ago

Loading kg2c into Neo4J on kg2canonicalized.rtx.ai with bash -x RTX/code/kg2c/tsv-to-neo4j-canonicalized.sh

acevedol commented 3 years ago

Neo4J loading completed with

Mon Sep 20 17:46:33 UTC 2021
+ echo '================ script finished ============================'
================ script finished ============================
acevedol commented 3 years ago

Updated CName for kg2canonicalized.rtx.ai

Screen Shot 2021-09-20 at 10 48 44 AM
acevedol commented 3 years ago

Validated results at http://kg2-7-3c.rtx.ai:7474/browser/ and results make sense

acevedol commented 2 years ago

This was finished a while back