JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
36 stars 2 forks source link

Out Of Memory during update #219

Closed olegs closed 3 years ago

olegs commented 4 years ago
unit-1199:~ oleg$ ssh -i ~/Documents/pubtrends-server.pem ubuntu@ec2-54-171-39-244.eu-west-1.compute.amazonaws.com
09:27:31.064 [main] PubmedCrawler   INFO  (1 / 4 update) /tmp/tmp7920915780324424879.tmp/pubmed20n1103.xml.gz: Downloading...
09:27:46.483 [main] PubmedCrawler   INFO  (1 / 4 update) /tmp/tmp7920915780324424879.tmp/pubmed20n1103.xml.gz: Parsing...
09:27:51.230 [main] PubmedXMLParser INFO  Storing articles 1-10000...
09:27:51.854 [main] PubmedCrawler   ERROR Lost connection to neo4j database
09:27:51.855 [main] PubmedCrawler   INFO  Deleting directory: /tmp/tmp7920915780324424879.tmp
09:27:51.855 [main] PubmedCrawler   INFO  Writing stats to /home/ubuntu/.pubtrends/pubmed_stats.tsv
09:27:51.855 [main] PubmedLoader    ERROR org.jetbrains.bio.pubtrends.pm.PubmedCrawlerException: org.neo4j.driver.v1.exceptions.ServiceUnavailableException: Unable to connect to 172.30.0.246:7687, ensure the database is running and that there is a working network connection to it.
09:27:51.855 [main] PubmedLoader    INFO  Waiting for 1024 seconds...
^C^C
ubuntu@ip-172-30-0-246:~$ java -cp pubtrends-0.2.386.jar org.jetbrains.bio.pubtrends.pm.PubmedLoader --fillDatabase | tee pm.log3
09:46:02.487 [main] PubmedLoader    INFO  Arguments:
[--fillDatabase]
09:46:02.500 [main] PubmedLoader    INFO  Config path: /home/ubuntu/.pubtrends/config.properties
09:46:02.501 [main] PubmedLoader    INFO  Init Neo4j database connection
Mar 06, 2020 9:46:02 AM org.neo4j.driver.internal.logging.JULogger info
INFO: Direct driver instance 1122606666 created for server address 172.30.0.246:7687
09:46:03.743 [main] PubmedLoader    INFO  Checking Pubmed FTP...
09:46:03.744 [main] PubmedLoader    INFO  Retrying downloading after any problems.
09:46:03.744 [main] PubmedLoader    INFO  Init Pubmed processor
09:46:03.777 [main] PubmedLoader    INFO  Init crawler
09:46:03.780 [main] PubmedCrawler   INFO  Collecting stats in /home/ubuntu/.pubtrends/pubmed_stats.tsv
09:46:03.781 [main] PubmedCrawler   INFO  Found crawler progress /home/ubuntu/.pubtrends/pubmed_last.tsv
09:46:03.793 [main] PubmedCrawler   INFO  Last downloaded file: pubmed20n1102.xml.gz
09:46:03.799 [main] PubmedCrawler   INFO  Created temporary directory: /tmp/tmp8099827598611918652.tmp
09:46:03.819 [main] PubmedFTPHandler INFO  Connecting to ftp.ncbi.nlm.nih.gov
09:46:04.637 [main] PubmedFTPHandler INFO  Fetching baseline files
09:46:05.792 [main] PubmedFTPHandler INFO  Fetching update files
09:46:06.355 [main] PubmedCrawler   INFO  Found 4 new file(s)
Baseline: 0, Updates: 4
09:46:06.355 [main] PubmedCrawler   INFO  Processing baseline
09:46:06.356 [main] PubmedCrawler   INFO  Processing updates
09:46:06.356 [main] PubmedCrawler   INFO  (1 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1103.xml.gz: Downloading...
09:46:17.522 [main] PubmedCrawler   INFO  (1 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1103.xml.gz: Parsing...
09:46:27.882 [main] PubmedXMLParser INFO  Storing articles 1-10000...
09:58:02.639 [main] PubmedXMLParser INFO  Storing articles 10001-20000...
10:08:03.515 [main] PubmedXMLParser INFO  Storing articles 20001-30000...
10:13:59.208 [main] PubmedXMLParser INFO  Deleting 82 articles
10:13:59.622 [main] PubmedXMLParser INFO  Articles found: 30000, deleted: 82, keywords: 63169, citations: 470761
10:13:59.626 [main] PubmedCrawler   INFO  (1 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1103.xml.gz: SUCCESS
10:13:59.630 [main] PubmedCrawler   INFO  (2 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1104.xml.gz: Downloading...
10:14:04.864 [main] PubmedCrawler   INFO  (2 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1104.xml.gz: Parsing...
10:14:08.276 [main] PubmedXMLParser INFO  Storing articles 1-7189...
10:15:53.816 [main] PubmedXMLParser INFO  Articles found: 7189, deleted: 0, keywords: 22901, citations: 18335
10:15:53.820 [main] PubmedCrawler   INFO  (2 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1104.xml.gz: SUCCESS
10:15:53.822 [main] PubmedCrawler   INFO  (3 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1105.xml.gz: Downloading...
10:16:09.274 [main] PubmedCrawler   INFO  (3 / 4 update) /tmp/tmp8099827598611918652.tmp/pubmed20n1105.xml.gz: Parsing...
10:16:14.459 [main] PubmedXMLParser INFO  Storing articles 1-10000...
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000095080000, 108003328, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
[hs_err_pid16769.log](https://github.com/JetBrains-Research/pubtrends/files/4297815/hs_err_pid16769.log)
olegs commented 3 years ago

Obsolete, pubtrends.net application is hosted on a machine with more RAM.