Open mkingori opened 4 years ago
Some info that would help:
To recover, you can usually start dcrdata with the following switch added to your usual command: --purge-n-blocks N
, where N
is at least the number that "stakedb" is behind "primary db". In this case, just use 32 or more.
Seems dcrdata took too long to shutdown. I was forced to close the terminal instead. The command --purge-n-blocks N
has helped me solve the issues.
I'm glad that resolved it for you.
Since it seems to be relatively common that dcrdata gets killed forcibly without shutting down gracefully, and that the message says nothing about how to recover by purging blocks (nor does it try to do it automatically). I'm going to reopen this issue so we can resolve it properly.
@chappjc I have been facing this issue for sometime now, purging didn't help. And in my case, dcrdata was running on a dev server without any sort of interruption. Solution that worked for me but ain't certain if if has consequences.
Also, I found out duplicate rows are not removed for votes, misses and tickets table. https://github.com/decred/dcrdata/blob/fab8a7e7bde6f4382ae9613db706a41788e7a0f3/db/dcrpg/sync.go#L393-L402 https://github.com/decred/dcrdata/blob/fab8a7e7bde6f4382ae9613db706a41788e7a0f3/db/dcrpg/tables.go#L199-L312
Good finds.
Any idea how to reproduce that error? I literally never hit it, and I run several servers in production. Unclean shutdown is the only thing that can cause it afaik
I dropped dcrdata db, deleted data dir, and ran dcrdata afresh uninterrupted(pm2 mode) , still panic persist. Until I applied a fast-forward to take stakedb to main db height.
You have to be dropping the wrong data. How else would it report any db heights?
Wait, how does pm2 start and stop dcrdata? Also, what does that do for you? You can run in a tmux.
You have to be dropping the wrong data. How else would it report any db heights?
I removed ~/.dcrdata/data
path and droped dcrdata db, so am starting afresh.
Pm2 helps me keep dcrdata process running on my dev server while I am offline. Maybe I will give tmux a try.
What I don't understand about pm2 is how it stops dcrdata. You cannot just kill it.
I removed ~/.dcrdata/data path and droped dcrdata db, so am starting afresh.
If dcrdata reports any positive DB heights other than dcrd itself, you've not started fresh. Paste the part of the startup logs that looks like this:
[INF] PSQL: DB schema version 1.11.0
[DBG] PSQL: meta height 628189 / blocks height 628189
...
[DBG] DATD: chainDB height: 628189
...
[INF] DATD: Starting blockchain sync...
[INF] PSQL: Current best block (dcrd): 631276
[INF] PSQL: Current best block (primary db): 628189
[INF] PSQL: Current best block (stakedb): 628189
Also please paste the error message with the block heights.
BTW, when I ask how to "reproduce that error", I'm asking for steps to causing the error from a functioning setup. I understand you are getting it repeatedly, but a reproduction is how you get to that state.
First log output from dcrdata
2022-01-19 20:59:07.531 [INF] PSQL: Empty database "dcrdata". Creating tables...
2022-01-19 20:59:07.532 [INF] PSQL: Creating the "meta" table.
2022-01-19 20:59:07.552 [INF] PSQL: Creating the "blocks" table.
2022-01-19 20:59:07.560 [INF] PSQL: Creating the "transactions" table.
2022-01-19 20:59:07.564 [INF] PSQL: Creating the "vins" table.
2022-01-19 20:59:07.578 [INF] PSQL: Creating the "vouts" table.
2022-01-19 20:59:07.583 [INF] PSQL: Creating the "block_chain" table.
2022-01-19 20:59:07.590 [INF] PSQL: Creating the "addresses" table.
2022-01-19 20:59:07.595 [INF] PSQL: Creating the "tickets" table.
2022-01-19 20:59:07.598 [INF] PSQL: Creating the "votes" table.
2022-01-19 20:59:07.602 [INF] PSQL: Creating the "misses" table.
2022-01-19 20:59:07.607 [INF] PSQL: Creating the "agendas" table.
2022-01-19 20:59:07.611 [INF] PSQL: Creating the "agenda_votes" table.
2022-01-19 20:59:07.613 [INF] PSQL: Creating the "testing" table.
2022-01-19 20:59:07.615 [INF] PSQL: Creating the "stats" table.
2022-01-19 20:59:07.620 [INF] PSQL: Creating the "treasury" table.
2022-01-19 20:59:07.622 [INF] PSQL: Creating the "swaps" table.
2022-01-19 20:59:07.630 [INF] PSQL: Pre-loading unspent ticket info for InsertVote optimization.
2022-01-19 20:59:07.631 [INF] PSQL: Setting PostgreSQL DB statement timeout to 20m0s.
2022-01-19 20:59:07.648 [INF] DATD: chainDB block summary table is empty.
2022-01-19 20:59:07.654 [INF] AGDB: agendas.db version 1.0.0 was set
2022-01-19 20:59:07.662 [INF] PRDB: proposals.db version 2.0.0 was set
2022-01-19 20:59:07.838 [INF] EXPR: Mean Voting Blocks calculated: 1006
2022-01-19 20:59:07.886 [INF] EXPR: Starting WebsocketHub run loop.
2022-01-19 20:59:07.887 [INF] PUBS: Starting WebsocketHub run loop.
2022-01-19 20:59:07.914 [INF] DATD: chainDB block summary table is empty.
2022-01-19 20:59:07.914 [INF] IAPI: Started Insight socket.io server.
2022-01-19 20:59:07.916 [INF] DATD: Now serving the explorer and APIs on http://localhost:17778/
2022-01-19 20:59:08.167 [INF] DATD: Starting blockchain sync...
2022-01-19 20:59:08.168 [INF] PSQL: Tables are empty, starting fresh.
2022-01-19 20:59:08.169 [INF] PSQL: Current best block (dcrd): 341542
2022-01-19 20:59:08.169 [INF] PSQL: Current best block (primary db): -1
2022-01-19 20:59:08.169 [INF] PSQL: Current best block (stakedb): 0
2022-01-19 20:59:08.169 [INF] PSQL: Collecting all UTXO data prior to height 0...
2022-01-19 20:59:08.169 [INF] PSQL: Pre-warming UTXO cache with 0 UTXOs...
2022-01-19 20:59:08.169 [INF] PSQL: UTXO cache is ready.
2022-01-19 20:59:08.169 [INF] PSQL: Large bulk load: Removing indexes and disabling duplicate checks.
2022-01-19 20:59:08.172 [INF] PSQL: Creating temporary index on addresses(tx_vin_vout_row_id).
2022-01-19 20:59:08.174 [INF] PSQL: Beginning SYNC STAGE 1 of 5 (block data import).
2022-01-19 20:59:08.174 [INF] PSQL: Scanning genesis block into chain db.
2022-01-19 20:59:08.187 [INF] PSQL: Processing blocks 1 to 500...
2022-01-19 20:59:15.555 [INF] PSQL: Processing blocks 501 to 1000...
2022-01-19 20:59:25.511 [INF] PSQL: Processing blocks 1001 to 1500...
2022-01-19 20:59:28.193 [INF] PSQL: ( 53 blk/s, 480 tx/s, 562 vin/sec, 1697 vout/s)
2022-01-19 20:59:48.182 [INF] PSQL: ( 21 blk/s, 540 tx/s, 746 vin/sec, 2011 vout/s)
2022-01-19 20:59:48.635 [INF] PSQL: Processing blocks 1501 to 2000...
2022-01-19 21:00:07.542 [INF] PSQL: Processing blocks 2001 to 2500...
Everything went well until:
panic: About to connect the wrong block: 549218, 546806
The stake database is corrupted. Restart with --purge-n-blocks=4824 to recover.
goroutine 1 [running]:
github.com/decred/dcrdata/db/dcrpg/v7.(*ChainDB).SyncChainDB.func6(0x86162, 0x128cf48, 0x32, 0xc017afa1a0)
/root/dcrdata/db/dcrpg/sync.go:308 +0x13bf
github.com/decred/dcrdata/db/dcrpg/v7.(*ChainDB).SyncChainDB(0xc0004c8000, 0x161d5f8, 0xc000357200, 0x161da20, 0xc017277180, 0x101, 0x0, 0xc017436540, 0x0, 0x0, ...)
/root/dcrdata/db/dcrpg/sync.go:361 +0xfc3
main._main.func10(0x0, 0x0, 0x0)
/root/dcrdata/cmd/dcrdata/main.go:812 +0xd7
main._main(0x161d5f8, 0xc000357200, 0x0, 0x0)
/root/dcrdata/cmd/dcrdata/main.go:825 +0x3839
main.main()
/root/dcrdata/cmd/dcrdata/main.go:60 +0x85
panic: About to connect the wrong block: 549217, 546806
The stake database is corrupted. Restart with --purge-n-blocks=4822 to recover.
Since then, the panic persisted at different heights. At some point I did purge main db(took over 8hours), but as soon as it synced for sometime, I saw same panic err.
Thank you for pasting that. I am surprised anything like that happened during the initial sync.
If the initial sync (pre-indexing) is interrupted, you really shouldn't bother trying to purge anything, just drop the whole table and datadir. So have you not been able to get a full sync of dcrdata so far because of this? Always panic during stage 1 of 5?
So have you not been able to get a full sync of dcrdata so far because of this? Always panic during stage 1 of 5?
yes.
That sucks. I don't know what to recommend. I assume you're not out of disk space? What did you mean by "fast forward"? Like this? https://github.com/decred/dcrdata/blob/fab8a7e7bde6f4382ae9613db706a41788e7a0f3/db/dcrpg/chkdcrpg/main.go#L197-L221
BTW, those blks/sec are slow. What kind of disk are you using, not that it would cause the error, just slowness.
2022-01-19 20:59:08.169 [INF] PSQL: Current best block (dcrd): 341542
Do you know why this is not more like 870897? Is dcrd syncing at the same time?
Do you know why this is not more like 870897? Is dcrd syncing at the same time?
As of then, yes. dcrd finished syncing in less than 8hours.
OK, don't try to do dcrdata sync until dcrd has finished it's sync/IBD (initial block download). Does it still panic even if dcrd is idle and all synchronized with the network?
What did you mean by "fast forward"? Like this?
Yeah, the "fast forward" loop runs in the main import loop instead of a panic.
Does it still panic even if dcrd is idle and all synchronized with the network?
It still panics.
OK I'm going to run a fresh sync on both mainnet and testnet. The fast forward loop is not a solution though. We need to get to the root cause.
I am running a dev server with Ubuntu 20.04.3 LTS (GNU/Linux 5.4.0-96-generic x86_64) 154GB space.
And how is disk utilization then? The production testnet machine uses up almost all of that on just the postgresql DB:
postgres=# SELECT datname, pg_size_pretty(pg_database_size(datname)) db_size FROM pg_database ORDER BY db_size;
datname | db_size
-----------+---------
dcrdata | 145 GB
The dcrdata datadir will take about 1.5GB on top of that.
The dcrd datadir is about 21G.
df -H
says the drive is at 221G utilization with everything else.
df -H says the drive is at 221G utilization with everything else.
That's a lot of space, I might be able to add more space soon. But without the "fast-forward" I can't get past init block import.
I think you're out of disk space...
When you hit the error, check df -h
and report back. Also check dcrd logs and journalctl -e
I think you're out of disk space...
Yeah, noticed it during setting spending info in addresses table.
That's a lot of space
Yeah, 120 GB is a surprising amount. On mainnet:
postgres=# SELECT datname, pg_size_pretty(pg_database_size(datname)) db_size FROM pg_database ORDER BY db_size;
datname | db_size
-----------------+---------
dcrdata_mainnet | 85 GB
I'm not sure why testnet surged so much recently.
du -h
report:
udev 3.9G 0 3.9G 0% /dev
tmpfs 796M 1.1M 795M 1% /run
/dev/vda1 155G 143G 13G 93% /
tmpfs 3.9G 16K 3.9G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda15 105M 5.2M 100M 5% /boot/efi
/dev/loop2 68M 68M 0 100% /snap/lxd/21835
/dev/loop1 68M 68M 0 100% /snap/lxd/21545
/dev/loop3 44M 44M 0 100% /snap/snapd/14295
/dev/loop4 44M 44M 0 100% /snap/snapd/14549
/dev/loop5 62M 62M 0 100% /snap/core20/1270
tmpfs 796M 0 796M 0% /run/user/0
/dev/loop6 62M 62M 0 100% /snap/core20/1328
Scraping the limits of usable space there. You should check the postgresql logs as well as dcrd logs and journalctl. I'd be very surprised if at least one of those wasn't throwing messages about failure to allocate. Please try on a machine with more space. However, you might be able to sync mainnet on that machine since for whatever reason it needs less space than testnet. Even that will be tight with both dcrdata and dcrd on the same drive.
dcrd is has been running smooth. For postgresql and stakeDB, space wasn't a concern during block import stage. Although, I will add more space soon since a complete sync would require about 150GB dedicated to dcrdata alone.
fresh mainnet sync stage 1 passed:
2022-02-06 17:44:07.553 [INF] PSQL: Processing blocks 631001 to 631317...
2022-02-06 17:44:15.453 [INF] PSQL: ( 40 blk/s, 725 tx/s, 4134 vin/sec, 4953 vout/s)
2022-02-06 17:44:15.887 [INF] PSQL: Avg. speed: 1145 tx/s, 5152 vout/s
2022-02-06 17:44:15.887 [INF] PSQL: Block import elapsed: 162.01 minutes, 631318 blocks (64.95 blocks/s)
2022-02-06 17:44:15.887 [INF] PSQL: Beginning SYNC STAGE 2 of 5 (duplicate row removal).
...
2022-02-06 17:49:23.124 [INF] PSQL: Beginning SYNC STAGE 3 of 5 (table indexing and analyzing).
...
2022-02-06 17:56:38.334 [INF] PSQL: Beginning SYNC STAGE 4 of 5 (deep database ANALYZE).
2022-02-06 17:57:38.349 [INF] PSQL: Beginning SYNC STAGE 5 of 5 (setting spending info in addresses table). This will take a while.
2022-02-06 17:57:38.351 [INF] PSQL: Updating address rows for blocks [0,9999]...
2022-02-06 17:57:52.104 [INF] PSQL: Updating address rows for blocks [10000,19999]...
...
2022-02-06 18:17:49.033 [INF] PSQL: Updating address rows for blocks [630000,631317]...
2022-02-06 18:17:52.871 [INF] PSQL: Updated 33460663 rows of addresses table.
2022-02-06 18:17:52.872 [INF] PSQL: Indexing addresses table on matching_tx_hash...
2022-02-06 18:19:16.418 [INF] PSQL: Performing an ANALYZE(600) on addresses table...
2022-02-06 18:19:19.576 [INF] PSQL: Catching up with network at block height 631326 from 631317...
2022-02-06 18:19:20.610 [INF] PSQL: SYNC COMPLETED at height 631326. Delta:
631327 blocks
11138463 transactions
37259547 ins
50082487 outs
72150839 addresses
will update when I do testnet3
testnet
2022-02-06 22:08:39.126 [INF] PSQL: Processing blocks 870501 to 871000...
2022-02-06 22:08:46.132 [INF] PSQL: ( 42 blk/s, 691 tx/s, 4080 vin/sec, 4828 vout/s)
2022-02-06 22:08:51.330 [INF] PSQL: Processing blocks 871001 to 871106...
2022-02-06 22:08:53.892 [INF] PSQL: Avg. speed: 961 tx/s, 4721 vout/s
2022-02-06 22:08:53.892 [INF] PSQL: Block import elapsed: 223.13 minutes, 871107 blocks (65.07 blocks/s)
2022-02-06 22:08:53.893 [INF] PSQL: Beginning SYNC STAGE 2 of 5 (duplicate row removal).
...
2022-02-06 22:15:37.713 [INF] PSQL: Beginning SYNC STAGE 3 of 5 (table indexing and analyzing).
...
2022-02-06 22:24:56.163 [INF] PSQL: Beginning SYNC STAGE 4 of 5 (deep database ANALYZE).
2022-02-06 22:26:11.787 [INF] PSQL: Beginning SYNC STAGE 5 of 5 (setting spending info in addresses table). This will take a while.
2022-02-06 22:26:11.794 [INF] PSQL: Updating address rows for blocks [0,9999]...
2022-02-06 22:26:33.473 [INF] PSQL: Updating address rows for blocks [10000,19999]...
...
2022-02-06 22:55:34.173 [INF] PSQL: Updating address rows for blocks [870000,871106]...
2022-02-06 22:55:38.609 [INF] PSQL: Updated 43088652 rows of addresses table.
2022-02-06 22:55:38.609 [INF] PSQL: Indexing addresses table on matching_tx_hash...
2022-02-06 22:57:45.107 [INF] PSQL: Performing an ANALYZE(600) on addresses table...
2022-02-06 22:57:52.259 [INF] PSQL: Catching up with network at block height 871135 from 871106...
2022-02-06 22:57:54.661 [INF] PSQL: SYNC COMPLETED at height 871135. Delta:
871136 blocks
12866327 transactions
48427059 ins
63206847 outs
91843276 addresses
Mainet Stage 1 syncing in < 3hr and testnet in < 4hrs is fast, and no connect block error. Will run testnet once I top my server space.
Hi,
When i launch
dcrdata
, with the command./dcrdata
, I am getting a panic ofabout to connect the wrong block
. See the logs.