decred / dcrdata

Decred block explorer, with packages and apps for data collection and storage. Written in Go.
https://dcrdata.decred.org/
ISC License
129 stars 128 forks source link

dcrpg: Remove duplicates for `votes`, `misses` and `treasury` table. #1978

Open ukane-philemon opened 11 months ago

ukane-philemon commented 11 months ago

Some time ago, I tried to sync dcrdata but encountered many issues, I even had to start afresh after my first attempt to fix it didn't work.

PG version: 15 OS: MacOS

  1. Duplicates found in votes table: This prevented creating a UNIQUE index uix_votes_hashes_index on tx_hash, block_hash.

dcrdata error:

``` 2024-01-03 16:49:02.236 [INF] PSQL: Beginning SYNC STAGE 2 of 5 (duplicate row removal). 2024-01-03 16:49:02.236 [INF] PSQL: Dropping temporary index on addresses(tx_vin_vout_row_id). 2024-01-03 16:49:02.733 [INF] PSQL: Finding and removing duplicate table rows before indexing... 2024-01-03 16:49:02.733 [INF] PSQL: Finding and removing duplicate vins entries... 2024-01-03 16:56:30.223 [INF] PSQL: Removed 719 duplicate vins entries. 2024-01-03 16:56:30.224 [INF] PSQL: Finding and removing duplicate vouts entries... 2024-01-03 17:07:22.226 [INF] PSQL: Removed 569 duplicate vouts entries. 2024-01-03 17:07:22.228 [INF] PSQL: Finding and removing duplicate transactions entries... 2024-01-03 17:09:04.828 [INF] PSQL: Removed 38 duplicate transactions entries. 2024-01-03 17:09:04.829 [INF] PSQL: Finding and removing duplicate agendas entries... 2024-01-03 17:09:04.843 [INF] PSQL: Removed 0 duplicate agendas entries. 2024-01-03 17:09:04.843 [INF] PSQL: Finding and removing duplicate agenda_votes entries... 2024-01-03 17:09:24.239 [INF] PSQL: Removed 0 duplicate agenda_votes entries. 2024-01-03 17:09:24.240 [INF] PSQL: Beginning SYNC STAGE 3 of 5 (table indexing and analyzing). 2024-01-03 17:09:24.240 [INF] PSQL: Indexing blocks table on hash... 2024-01-03 17:09:36.928 [INF] PSQL: Indexing blocks table on height... 2024-01-03 17:09:43.208 [INF] PSQL: Indexing blocks table on time... 2024-01-03 17:09:47.241 [INF] PSQL: Indexing transactions table on tx/block hashes... 2024-01-03 17:11:24.276 [INF] PSQL: Indexing transactions table on block id/idx... 2024-01-03 17:12:46.724 [INF] PSQL: Indexing transactions table on block height... 2024-01-03 17:13:26.536 [INF] PSQL: Indexing vins table on txin... 2024-01-03 17:17:58.829 [INF] PSQL: Indexing vins table on prevouts... 2024-01-03 17:22:45.541 [INF] PSQL: Indexing vouts table on tx hash and index... 2024-01-03 17:29:31.169 [INF] PSQL: Indexing vouts table on spend tx row id... 2024-01-03 17:32:04.452 [INF] PSQL: Indexing votes table on candidate block... 2024-01-03 17:32:30.367 [INF] PSQL: Indexing votes table on block hash... 2024-01-03 17:32:47.601 [INF] PSQL: Indexing votes table on block+tx hash... 2024-01-03 17:32:53.348 [ERR] DATD: dcrpg.SyncChainDB failed at height 1293011. 2024-01-03 17:32:53.348 [INF] DATD: Shutdown requested. Shutting down... 2024-01-03 17:32:53.348 [INF] DATD: Gracefully shutting down web server... 2024-01-03 17:32:53.348 [DBG] JAPI: Got quit signal. Exiting block connected handler for STATUS monitor. 2024-01-03 17:32:53.348 [DBG] DATD: Block prefetcher hits = 1148792, misses = 2. 2024-01-03 17:32:53.348 [INF] PUBS: Stopping websocket hub. 2024-01-03 17:32:53.349 [DBG] PUBS: Unregistered and killed 0 clients. 2024-01-03 17:32:53.349 [INF] EXPR: Stopping websocket hub. 2024-01-03 17:32:54.922 [DBG] SKDB: badger: Got compaction priority: {level:0 score:1.73 dropPrefixes:[]} 2024-01-03 17:32:55.762 [DBG] SKDB: badger: LOG Compact. Added 1293014 keys. Skipped 672 keys. Iteration took: 838.997359ms 2024-01-03 17:32:56.411 [DBG] SKDB: badger: Discard stats: map[0:269386] 2024-01-03 17:32:56.425 [INF] SKDB: badger: LOG Compact 0->1, del 2 tables, add 1 tables, took 1.502872897s 2024-01-03 17:32:56.426 [DBG] SKDB: badger: Compaction for level: 0 DONE 2024-01-03 17:32:57.842 [INF] DATD: Closing connection to dcrd. 2024-01-03 17:32:57.843 [INF] DATD: Bye! 2024-01-03 17:32:58.095 [ERR] DATD: IndexAll failed: pq: could not create unique index "uix_votes_hashes_index" ```

Postgres log:

``` 2024-01-03 17:29:10.647 WAT [11497] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 134 recycled; write=0.019 s, sync=0.001 s, total=0.175 s; sync files=0, longest=0.000 s, average=0.000 s; distance=2196331 kB, estimate=2198084 kB 2024-01-03 17:32:53.261 WAT [305] ERROR: could not create unique index "uix_votes_hashes_index" 2024-01-03 17:32:53.261 WAT [305] DETAIL: Key (tx_hash, block_hash)=(6ad77b63e7df9b66ca513b49b6e3c1fa7af62d5d72e290a9d2731214eaa12a85, 0000003c7c7cd1d990c5918e7b23b2d5e1ca397095d93b83e14997d0f7c6c15c) is duplicated. 2024-01-03 17:32:53.261 WAT [305] STATEMENT: CREATE UNIQUE INDEX uix_votes_hashes_index ON votes(tx_hash, block_hash); 2024-01-03 17:32:53.261 WAT [97569] ERROR: could not create unique index "uix_votes_hashes_index" 2024-01-03 17:32:53.261 WAT [97569] DETAIL: Key (tx_hash, block_hash)=(6ad77b63e7df9b66ca513b49b6e3c1fa7af62d5d72e290a9d2731214eaa12a85, 0000003c7c7cd1d990c5918e7b23b2d5e1ca397095d93b83e14997d0f7c6c15c) is duplicated. 2024-01-03 17:32:53.261 WAT [97569] CONTEXT: parallel worker 2024-01-03 17:32:53.261 WAT [97569] STATEMENT: CREATE UNIQUE INDEX uix_votes_hashes_index ON votes(tx_hash, block_hash); 2024-01-03 17:32:53.268 WAT [306] FATAL: terminating background worker "parallel worker" due to administrator command 2024-01-03 17:32:53.268 WAT [306] STATEMENT: CREATE UNIQUE INDEX uix_votes_hashes_index ON votes(tx_hash, block_hash); ```
  1. Duplicates found in misses table: This prevented creating a UNIQUE index uix_misses_hashes_index on ticket_hash, block_hash. dcrdata error:
2024-01-03 21:40:00.003 [INF] PSQL: Beginning SYNC STAGE 2 of 5 (duplicate row removal).
2024-01-03 21:40:00.003 [INF] PSQL: Dropping temporary index on addresses(tx_vin_vout_row_id).
2024-01-03 21:40:00.170 [INF] PSQL: Finding and removing duplicate table rows before indexing...
2024-01-03 21:40:00.170 [INF] PSQL: Finding and removing duplicate vins entries...
2024-01-03 21:47:00.622 [INF] PSQL: Removed 0 duplicate vins entries.
2024-01-03 21:47:00.623 [INF] PSQL: Finding and removing duplicate vouts entries...
2024-01-03 21:55:25.063 [INF] PSQL: Removed 0 duplicate vouts entries.
2024-01-03 21:55:25.064 [INF] PSQL: Finding and removing duplicate transactions entries...
2024-01-03 21:56:49.243 [INF] PSQL: Removed 0 duplicate transactions entries.
2024-01-03 21:56:49.244 [INF] PSQL: Finding and removing duplicate agendas entries...
2024-01-03 21:56:49.255 [INF] PSQL: Removed 0 duplicate agendas entries.
2024-01-03 21:56:49.255 [INF] PSQL: Finding and removing duplicate agenda_votes entries...
2024-01-03 21:57:08.451 [INF] PSQL: Removed 0 duplicate agenda_votes entries.
2024-01-03 21:57:08.451 [INF] PSQL: Finding and removing duplicate votes entries...
2024-01-03 21:57:35.071 [INF] PSQL: Removed 8 duplicate votes entries.
2024-01-03 21:57:35.161 [INF] PSQL: Beginning SYNC STAGE 3 of 5 (table indexing and analyzing).
2024-01-03 21:57:35.161 [INF] PSQL: Indexing blocks table on hash...
2024-01-03 21:57:47.635 [INF] PSQL: Indexing blocks table on height...
2024-01-03 21:57:55.621 [INF] PSQL: Indexing blocks table on time...
2024-01-03 21:58:02.184 [INF] PSQL: Indexing transactions table on tx/block hashes...
2024-01-03 21:59:34.610 [INF] PSQL: Indexing transactions table on block id/idx...
2024-01-03 22:00:51.359 [INF] PSQL: Indexing transactions table on block height...
2024-01-03 22:01:29.164 [INF] PSQL: Indexing vins table on txin...
2024-01-03 22:05:39.943 [INF] PSQL: Indexing vins table on prevouts...
2024-01-03 22:10:33.110 [INF] PSQL: Indexing vouts table on tx hash and index...
2024-01-03 22:16:12.472 [INF] PSQL: Indexing vouts table on spend tx row id...
2024-01-03 22:18:16.488 [INF] PSQL: Indexing votes table on candidate block...
2024-01-03 22:18:40.167 [INF] PSQL: Indexing votes table on block hash...
2024-01-03 22:19:01.400 [INF] PSQL: Indexing votes table on block+tx hash...
2024-01-03 22:19:31.727 [INF] PSQL: Indexing votes table on vote version...
2024-01-03 22:19:41.788 [INF] PSQL: Indexing votes table on height...
2024-01-03 22:19:45.512 [INF] PSQL: Indexing votes table on Block Time...
2024-01-03 22:19:50.109 [INF] PSQL: Indexing misses table...
2024-01-03 22:19:51.629 [ERR] DATD: dcrpg.SyncChainDB failed at height 1293156.
2024-01-03 22:19:51.629 [DBG] DATD: Block prefetcher hits = 288, misses = 2.
2024-01-03 22:19:51.629 [INF] PUBS: Stopping websocket hub.
2024-01-03 22:19:51.630 [DBG] PUBS: Unregistered and killed 0 clients.
2024-01-03 22:19:51.630 [INF] EXPR: Stopping websocket hub.
2024-01-03 22:19:51.629 [INF] DATD: Shutdown requested. Shutting down...
2024-01-03 22:19:51.631 [INF] DATD: Gracefully shutting down web server...
2024-01-03 22:19:51.631 [DBG] JAPI: Got quit signal. Exiting block connected handler for STATUS monitor.
2024-01-03 22:19:51.686 [DBG] SKDB: badger: Got compaction priority: {level:0 score:1.73 dropPrefixes:[]}
2024-01-03 22:19:52.931 [DBG] SKDB: badger: LOG Compact. Added 1293159 keys. Skipped 1 keys. Iteration took: 1.243749776s
2024-01-03 22:19:53.870 [DBG] SKDB: badger: Discard stats: map[]
2024-01-03 22:19:53.881 [INF] SKDB: badger: LOG Compact 0->1, del 2 tables, add 1 tables, took 2.194786337s
2024-01-03 22:19:53.881 [DBG] SKDB: badger: Compaction for level: 0 DONE
2024-01-03 22:19:53.965 [INF] DATD: Closing connection to dcrd.
2024-01-03 22:19:53.965 [INF] DATD: Bye!
2024-01-03 22:19:54.217 [ERR] DATD: IndexAll failed: pq: could not create unique index "uix_misses_hashes_index"

Postgres log:

``` 2024-01-03 22:18:08.619 WAT [11497] LOG: checkpoint complete: wrote 15 buffers (0.0%); 0 WAL file(s) added, 0 removed, 134 recycled; write=1.591 s, sync=0.011 s, total=1.748 s; sync files=15, longest=0.007 s, average=0.001 s; distance=2207201 kB, estimate=2207201 kB 2024-01-03 22:19:51.594 WAT [15152] ERROR: could not create unique index "uix_misses_hashes_index" 2024-01-03 22:19:51.594 WAT [15152] DETAIL: Key (ticket_hash, block_hash)=(02ecc9d092924d0f1c12e3e974d9c903a1244020250b437ae0fd1b131b220521, 0000007708e4ec4f808b6ac1038786aa6e2dcf7e2932d36b1aa48dfff3b1f906) is duplicated. 2024-01-03 22:19:51.594 WAT [15152] STATEMENT: CREATE UNIQUE INDEX uix_misses_hashes_index ON misses(ticket_hash, block_hash); ```
  1. Duplicates in treasury table: This prevented creating a UNIQUE index uix_treasury_tx_hash on tx_hash, block_hash
2024-01-03 23:46:18.461 [INF] PSQL: Indexing addresses table on block time...
2024-01-03 23:48:49.292 [INF] PSQL: Indexing addresses table on address...
2024-01-03 23:53:28.386 [INF] PSQL: Indexing addresses table on vout DB ID...
2024-01-03 23:58:56.456 [INF] PSQL: Indexing treasury on tx hash...
2024-01-03 23:58:59.041 [ERR] DATD: dcrpg.SyncChainDB failed at height 1293188.
2024-01-03 23:58:59.042 [DBG] DATD: Block prefetcher hits = 62, misses = 2.
2024-01-03 23:58:59.042 [INF] PUBS: Stopping websocket hub.
2024-01-03 23:58:59.042 [DBG] PUBS: Unregistered and killed 0 clients.
2024-01-03 23:58:59.042 [INF] EXPR: Stopping websocket hub.
2024-01-03 23:58:59.042 [INF] DATD: Shutdown requested. Shutting down...
2024-01-03 23:58:59.045 [INF] DATD: Gracefully shutting down web server...
2024-01-03 23:58:59.045 [DBG] JAPI: Got quit signal. Exiting block connected handler for STATUS monitor.
2024-01-03 23:58:59.218 [DBG] SKDB: badger: Got compaction priority: {level:0 score:1.73 dropPrefixes:[]}
2024-01-03 23:59:00.958 [DBG] SKDB: badger: LOG Compact. Added 1293191 keys. Skipped 1 keys. Iteration took: 1.737287975s
2024-01-03 23:59:01.588 [DBG] SKDB: badger: Discard stats: map[]
2024-01-03 23:59:01.604 [INF] SKDB: badger: LOG Compact 0->1, del 2 tables, add 1 tables, took 2.384352759s
2024-01-03 23:59:01.604 [DBG] SKDB: badger: Compaction for level: 0 DONE
2024-01-03 23:59:01.647 [INF] DATD: Closing connection to dcrd.
2024-01-03 23:59:01.648 [INF] DATD: Bye!
2024-01-03 23:59:01.900 [ERR] DATD: IndexAll failed: pq: could not create unique index "uix_treasury_tx_hash"

Postgres log:

``` 001 s, total=0.562 s; sync files=0, longest=0.000 s, average=0.000 s; distance=2194497 kB, estimate=2200891 kB 2024-01-03 23:58:59.001 WAT [20792] ERROR: could not create unique index "uix_treasury_tx_hash" 2024-01-03 23:58:59.001 WAT [20792] DETAIL: Key (tx_hash, block_hash)=(095c8608ab2c44d783188ce96d04ec86c610fee7404d9c51c630ff4adf3c00d6, 0000007708e4ec4f808b6ac1038786aa6e2dcf7e2932d36b1aa48dfff3b1f906) is duplicated. 2024-01-03 23:58:59.001 WAT [20792] STATEMENT: CREATE UNIQUE INDEX uix_treasury_tx_hash ON treasury(tx_hash, block_hash); ```

I have a working suggestion that fixes these issues and will submit it as a PR.

ukane-philemon commented 11 months ago

Also, I found out duplicate rows are not removed for votes, misses and tickets table.

I'm yet to encounter issues with the tickets table.

Related: https://github.com/decred/dcrdata/issues/1562#issuecomment-1030886507