JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
36 stars 2 forks source link

Exception during Pubmed DB update #241

Closed olegs closed 3 years ago

olegs commented 3 years ago

Exception during Pubmed DB update: java -cp ~/pubtrends-0.9.786.jar org.jetbrains.bio.pubtrends.pm.PubmedLoader --fillDatabase | tee -a ~/postgres-write.log

Exception:

23:26:12.002 [main] PubmedXMLParser INFO  Articles found: 24016, deleted: 30, keywords: 96267, citations: 23636
23:26:12.024 [main] PubmedCrawler   INFO  (108 / 216 total) [update] /tmp/tmp6525210829098710105.tmp/pubmed20n1354.xml.gz: SUCCESS
23:26:12.035 [main] PubmedCrawler   INFO  (109 / 216 total) [update] /tmp/tmp6525210829098710105.tmp/pubmed20n1355.xml.gz: Downloading...
23:26:16.689 [main] PubmedCrawler   INFO  (109 / 216 total) [update] /tmp/tmp6525210829098710105.tmp/pubmed20n1355.xml.gz: Parsing...
23:26:19.095 [main] PubmedXMLParser INFO  Storing articles 1-10000...
23:28:21.754 [main] PubmedXMLParser INFO  Storing articles 10001-20000...
23:30:10.512 [main] PubmedXMLParser INFO  Storing articles 20001-30000...
23:30:23.438 [main] PubmedXMLParser INFO  Deleting 16715 articles
23:30:31.338 [main] PubmedCrawler   INFO  Deleting directory: /tmp/tmp6525210829098710105.tmp
23:30:31.353 [main] PubmedCrawler   INFO  Writing stats to /home/ubuntu/.pubtrends/pubmedpostgreswriter_stats.tsv
Exception in thread "main" org.jetbrains.exposed.exceptions.ExposedSQLException: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
SQL: [Failed on expanding args for DELETE: org.jetbrains.exposed.sql.statements.DeleteStatement@4bff64c2]
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:61)
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:128)
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:122)
        at org.jetbrains.exposed.sql.statements.Statement.execute(Statement.kt:29)
        at org.jetbrains.exposed.sql.statements.DeleteStatement$Companion.where(DeleteStatement.kt:24)
        at org.jetbrains.exposed.sql.QueriesKt.deleteWhere(Queries.kt:28)
        at org.jetbrains.exposed.sql.QueriesKt.deleteWhere$default(Queries.kt:27)
        at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter$delete$1.invoke(PubmedPostgresWriter.kt:154)
        at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter$delete$1.invoke(PubmedPostgresWriter.kt:11)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
        at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter.delete(PubmedPostgresWriter.kt:150)
        at org.jetbrains.bio.pubtrends.pm.PubmedXMLParser.parseData(PubmedXMLParser.kt:452)
        at org.jetbrains.bio.pubtrends.pm.PubmedXMLParser.parse(PubmedXMLParser.kt:95)
        at org.jetbrains.bio.pubtrends.pm.PubmedCrawler.downloadAndProcessFiles(PubmedCrawler.kt:137)
        at org.jetbrains.bio.pubtrends.pm.PubmedCrawler.update(PubmedCrawler.kt:83)
        at org.jetbrains.bio.pubtrends.pm.PubmedLoader.main(PubmedLoader.kt:96)
        Suppressed: org.jetbrains.exposed.exceptions.ExposedSQLException: org.postgresql.util.PSQLException: ERROR: cannot refresh materialized view "public.matview_pmcitations" concurrently
  Hint: Create a unique index with no WHERE clause on one or more columns of the materialized view.
  Where: SQL statement "refresh materialized view concurrently matview_pmcitations"
PL/pgSQL function inline_code_block line 4 at SQL statement
SQL: [
                    do
                    $$
                    begin
                    IF exists (select matviewname from pg_matviews where matviewname = 'matview_pmcitations') THEN
                        refresh materialized view concurrently matview_pmcitations;
                    END IF;
                    end;
                    $$;
                    ]
                at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:61)
                at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:128)
                at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:122)
                at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:101)
                at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:92)
                at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter$close$1.invoke(PubmedPostgresWriter.kt:168)
                at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter$close$1.invoke(PubmedPostgresWriter.kt:11)
                at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
                at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
                at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
                at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
                at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter.close(PubmedPostgresWriter.kt:166)
                at kotlin.io.CloseableKt.closeFinally(Closeable.kt:56)
                at org.jetbrains.bio.pubtrends.pm.PubmedLoader.main(PubmedLoader.kt:65)
        Caused by: org.postgresql.util.PSQLException: ERROR: cannot refresh materialized view "public.matview_pmcitations" concurrently
  Hint: Create a unique index with no WHERE clause on one or more columns of the materialized view.
  Where: SQL statement "refresh materialized view concurrently matview_pmcitations"
PL/pgSQL function inline_code_block line 4 at SQL statement
                at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
                at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178)
                at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306)
                at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
                at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
                at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
                at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:132)
                at org.jetbrains.exposed.sql.Transaction$exec$2.executeInternal(Transaction.kt:105)
                at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:59)
                ... 13 more
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:333)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
        at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:132)
        at org.jetbrains.exposed.sql.statements.DeleteStatement.executeInternal(DeleteStatement.kt:11)
        at org.jetbrains.exposed.sql.statements.DeleteStatement.executeInternal(DeleteStatement.kt:6)
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:59)
        ... 18 more
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 33430
        at org.postgresql.core.PGStream.sendInteger2(PGStream.java:224)
        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1440)
        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1762)
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1326)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:298)
        ... 25 more
olegs commented 3 years ago

After removing concurrent update:

SQL: [Failed on expanding args for DELETE: org.jetbrains.exposed.sql.statements.DeleteStatement@2b9ed6da]
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:61)
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:128)
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:122)
        at org.jetbrains.exposed.sql.statements.Statement.execute(Statement.kt:29)
        at org.jetbrains.exposed.sql.statements.DeleteStatement$Companion.where(DeleteStatement.kt:24)
        at org.jetbrains.exposed.sql.QueriesKt.deleteWhere(Queries.kt:28)
        at org.jetbrains.exposed.sql.QueriesKt.deleteWhere$default(Queries.kt:27)
        at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter$delete$1.invoke(PubmedPostgresWriter.kt:154)
        at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter$delete$1.invoke(PubmedPostgresWriter.kt:11)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:103)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:74)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57)
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:57)
        at org.jetbrains.bio.pubtrends.db.PubmedPostgresWriter.delete(PubmedPostgresWriter.kt:150)
        at org.jetbrains.bio.pubtrends.pm.PubmedXMLParser.parseData(PubmedXMLParser.kt:452)
        at org.jetbrains.bio.pubtrends.pm.PubmedXMLParser.parse(PubmedXMLParser.kt:95)
        at org.jetbrains.bio.pubtrends.pm.PubmedCrawler.downloadAndProcessFiles(PubmedCrawler.kt:137)
        at org.jetbrains.bio.pubtrends.pm.PubmedCrawler.update(PubmedCrawler.kt:83)
        at org.jetbrains.bio.pubtrends.pm.PubmedLoader.main(PubmedLoader.kt:96)
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:333)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
        at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:132)
        at org.jetbrains.exposed.sql.statements.DeleteStatement.executeInternal(DeleteStatement.kt:11)
        at org.jetbrains.exposed.sql.statements.DeleteStatement.executeInternal(DeleteStatement.kt:6)
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:59)
        ... 18 more
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 33430
        at org.postgresql.core.PGStream.sendInteger2(PGStream.java:224)
        at org.postgresql.core.v3.QueryExecutorImpl.sendParse(QueryExecutorImpl.java:1440)
        at org.postgresql.core.v3.QueryExecutorImpl.sendOneQuery(QueryExecutorImpl.java:1762)
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1326)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:298)
        ... 25 more
olegs commented 3 years ago

org.postgresql.core.v3.QueryExecutorImpl supports no more than MAX_SHORT number of arguments per query, here in attempt to remove 16715 papers, we generate query with 2* 16715 params, which exceeds the threshold.