IHTSDO / snowstorm

Scalable SNOMED CT Terminology Server using Elasticsearch
Other
208 stars 83 forks source link

Unexplained cause of failed import #10

Closed nhnicwaller closed 6 years ago

nhnicwaller commented 6 years ago

I'm trying to set up snowstorm for the first time, and I'm running into a bit of trouble. I'm starting up snowstorm and doing the import immediately on launch.

java -Xmx4g -jar /opt/snowstorm-2.1.0.jar --delete-indices --import=/opt/SnomedCT.zip

After running for a while, the import appears to fail.

2018-11-07 18:02:17.624 ERROR 106 --- [pool-5-thread-2] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. 2018-11-07 18:02:17.625 ERROR 106 --- [ool-5-thread-14] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. 2018-11-07 18:02:17.752 ERROR 106 --- [pool-5-thread-1] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. 2018-11-07 18:02:17.757 ERROR 106 --- [ool-5-thread-15] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. 2018-11-07 18:02:17.788 ERROR 106 --- [pool-5-thread-5] o.ihtsdo.otf.snomedboot.ReleaseImporter : Failed to read or process lines. [...] 2018-11-07 18:02:17.807 ERROR 106 --- [ main] o.s.s.core.rf2.rf2import.ImportService : Failed RF2 SNAPSHOT import on branch MAIN. ID 1a2bff8f-8a00-4d80-8047-b056b90859fe

I see stack traces for a few occurrences of UncategorizedExecutionException, all of which are caused by java.net.ConnectException (Connection refused). All of this concludes with the Spring application context shutting down.

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled. 2018-11-07 18:02:18.270 ERROR 106 --- [ main] o.s.boot.SpringApplication : Application run failed java.lang.IllegalStateException: Failed to execute ApplicationRunner

I'm using SnomedCT_RF2Release_CDN_20181031 obtained through Canada Health Infoway.

Shutting down after a failed import seems like a reasonable approach, but none of this output really helps me identify the specific file(s) or line(s) that are causing a problem with import. It would be helpful to provide more information here, perhaps by logging the names of files that are being opened, before they are fully processed.

kaicode commented 6 years ago

Hi @nhnicwaller,

If you are getting ConnectionException this is probably the connection to Elasticsearch. Make sure Elasticsearch is up and running.

The default configuration of Elasticsearch is to run on port 9200 so try accessing http://localhost:9200/ You should see something like:

{
"name": "rGIGBoc",
"cluster_name": "elasticsearch",
"cluster_uuid": "Ur8O6MeNQDOBcSNRSwaL3w",
"version": {
"number": "6.4.2",
"build_flavor": "default",
"build_type": "tar",
"build_hash": "04711c2",
"build_date": "2018-09-26T13:34:09.098244Z",
"build_snapshot": false,
"lucene_version": "7.4.0",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}

Next check the logs of Elasticsearch during the import to see if it's running out of disk.

I hope that helps!

Kai

nhnicwaller commented 6 years ago

Thanks Kai, that was it! My docker runtime was killing the Elasticsearch process because Elasticsearch was demanding more memory than docker could provide. Here's what I saw at the end of my Elasticsearch log:

==> /var/log/elasticsearch.log <== [2018-11-08T01:28:05,026][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started Killed

According to my docker preferences I had allocated only 2.0 GiB to docker. Of course that wasn't enough because I had also assigned -Xmx2g to Elasticsearch. I bumped up the Docker allocation to 5.0 GiB and then the import succeeded.