JetBrains-Research / pubtrends

Scientific literature explorer. Runs a Pubmed or Semantic Scholar search and allows user to explore high-level structure of result papers
Apache License 2.0
35 stars 2 forks source link

Problem creating text index on SSPublications table #318

Closed olegs closed 2 years ago

olegs commented 2 years ago
2022-08-20 10:48:29,273 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:57] Creating index ss_title_abstract_
index                                                                                                                                              
2022-08-20 12:23:03,235 WARN [main] Exposed [ThreadLocalTransactionManager.kt:169] Transaction attempt #0 failed: org.postgresql.util.PSQLException
: An I/O error occurred while sending to the backend.. Statement(s): CREATE INDEX IF NOT EXISTS ss_title_abstract_index ON SSPublications using GIN
 (tsv);                                                                                                                                            
org.jetbrains.exposed.exceptions.ExposedSQLException: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.       
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:61)                                                       
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:184)                                                                          
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:126)                                                                          
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:102)                                                                          
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:93)                                                                           
        at org.jetbrains.bio.pubtrends.db.SemanticScholarPostgresWriter$1.invoke(SemanticScholarPostgresWriter.kt:58)                              
        at org.jetbrains.bio.pubtrends.db.SemanticScholarPostgresWriter$1.invoke(SemanticScholarPostgresWriter.kt:33)                              
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction$run(ThreadLocalTransactionManager.kt:156)  
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.access$inTopLevelTransaction$run(ThreadLocalTransactionManager.kt
:1)                                                                                                                                                
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt$inTopLevelTransaction$1.invoke(ThreadLocalTransactionManager.kt:1
97)                                                                                                                                                
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.keepAndRestoreTransactionRefAfterRun(ThreadLocalTransactionManage
r.kt:205)                                                                                                                                          
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:196)      
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt$transaction$1.invoke(ThreadLocalTransactionManager.kt:134)       
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.keepAndRestoreTransactionRefAfterRun(ThreadLocalTransactionManage
r.kt:205)                                                                                                                                          
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:106)                
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:104)                
        at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction$default(ThreadLocalTransactionManager.kt:103)        
        at org.jetbrains.bio.pubtrends.db.SemanticScholarPostgresWriter.<init>(SemanticScholarPostgresWriter.kt:33)                                
        at org.jetbrains.bio.pubtrends.ss.SemanticScholarLoader.main(SemanticScholarLoader.kt:52)                                                  
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:383)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:181)
        at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:149)
        at org.jetbrains.exposed.sql.Transaction$exec$2.executeInternal(Transaction.kt:107)
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:59)
        ... 18 common frames omitted 
Caused by: java.io.EOFException: null

Database log:

2022-08-20 12:23:02.984 UTC [1] LOG:  server process (PID 32) was terminated by signal 9: Killed
2022-08-20 12:23:02.984 UTC [1] DETAIL:  Failed process was running: CREATE INDEX IF NOT EXISTS ss_title_abstract_index ON SSPublications using GIN (tsv)
2022-08-20 12:23:02.986 UTC [1] LOG:  terminating any other active server processes
2022-08-20 12:23:03.252 UTC [135] FATAL:  the database system is in recovery mode
2022-08-20 12:23:03.261 UTC [136] FATAL:  the database system is in recovery mode
2022-08-20 12:23:03.267 UTC [137] FATAL:  the database system is in recovery mode
2022-08-20 12:23:03.270 UTC [1] LOG:  all server processes terminated; reinitializing
2022-08-20 12:23:03.899 UTC [139] FATAL:  the database system is in recovery mode
2022-08-20 12:23:03.900 UTC [138] LOG:  database system was interrupted; last known up at 2022-08-20 11:41:29 UTC
2022-08-20 12:23:03.969 UTC [138] LOG:  database system was not properly shut down; automatic recovery in progress
2022-08-20 12:23:03.974 UTC [138] LOG:  redo starts at 16B4/87C4D2D8
2022-08-20 12:23:29.195 UTC [138] LOG:  invalid record length at 16B5/4BB61EF0: wanted 24, got 0
2022-08-20 12:23:29.195 UTC [138] LOG:  redo done at 16B5/4BB61EB0 system usage: CPU: user: 5.32 s, system: 5.54 s, elapsed: 25.22 s
2022-08-20 12:24:32.113 UTC [1] LOG:  database system is ready to accept connections
olegs commented 2 years ago

According to this stack trace, index on tsp vector was successfully built, however adding multiple indexes shouldn't be performed in a single transaction.

2022-08-20 13:37:23,256 INFO [main] o.j.b.p.s.SemanticScholarLoader [SemanticScholarLoader.kt:47] Config path: /home/ubuntu/.pubtrends/config.prope
rties                                                                                                                                              
2022-08-20 13:37:23,288 INFO [main] o.j.b.p.s.SemanticScholarLoader [SemanticScholarLoader.kt:51] Init Postgresql database connection              
2022-08-20 13:37:23,290 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:25] Initializing DB connection       
2022-08-20 13:37:23,356 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:32] Init transaction starting        
2022-08-20 13:37:23,366 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:34] Creating schema                  
2022-08-20 13:37:24,203 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:37] Adding TSV column                
2022-08-20 13:37:24,225 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:40] Adding primary key, required for 
batch update                                                                                                                                       
2022-08-20 13:37:24,265 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.kt:57] Creating index ss_title_abstract_
index                                                                                                                                              
^[[A^[[B^[[C^[[D^[[A^[[B^[[A^[[B^[[C^[[D2022-08-21 09:18:55,067 INFO [main] o.j.b.p.d.SemanticScholarPostgresWriter [SemanticScholarPostgresWriter.
kt:61] Creating citations material view matview_sscitations                                                                                        
2022-08-21 09:52:50,493 WARN [main] Exposed [ThreadLocalTransactionManager.kt:169] Transaction attempt #0 failed: org.postgresql.util.PSQLException
: An I/O error occurred while sending to the backend.. Statement(s):                                                                               
                create materialized view if not exists matview_sscitations as
                SELECT ssid_in as ssid, crc32id_in as crc32id, COUNT(*) AS count
                FROM SSCitations C
                GROUP BY ssid, crc32id
                HAVING COUNT(*) >= 3; -- Ignore tail of 0,1,2 cited papers
                create index if not exists SSCitation_matview_index on matview_sscitations using hash(crc32id);

org.jetbrains.exposed.exceptions.ExposedSQLException: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
        at org.jetbrains.exposed.sql.statements.Statement.executeIn$exposed(Statement.kt:61)
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:184) 
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:126) 
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:102) 
        at org.jetbrains.exposed.sql.Transaction.exec(Transaction.kt:93)
olegs commented 2 years ago

Indexes building split into separate transactions is implemented in https://github.com/JetBrains-Research/pubtrends/commit/c90dc89886481904773dc17da994e7e27f416eea