IntersectMBO / cardano-db-sync

A component that follows the Cardano chain and stores blocks and transactions in PostgreSQL
Apache License 2.0
284 stars 158 forks source link

Rolling back to genesis with snapshot restoration #1741

Open saravadeanil opened 1 week ago

saravadeanil commented 1 week ago

Versions The db-sync version: v13.2.0.2 PostgreSQL version: v14.10-alpine

Build/Install Method The method you use to build or install cardano-db-sync: Docker

Run method The method you used to run cardano-db-sync (eg Nix/Docker/systemd/none): Kubernetes

Additional context

Hi, I am trying to restore the db-sync component from the official snapshot LINK . However, it rolls back and starts from genesis. According to the logs, it seems it can detect the snapshot file, but it removes it and starts to sync from genesis.

Can you please help me with the snapshot restoration?

Problem Report

[db-sync-node.Subscription:Notice:73] [2024-06-24 15:11:40.21 UTC] Identity Required subscriptions started
[db-sync-node:Info:65] [2024-06-24 15:11:42.62 UTC] Delaying delete of 6831214 while rolling back to genesis. Applying blocks until a new block is found. The node is currently at Tip (SlotNo 127675597) b08623b9662e24ce20823e128cf066a550457734619fa56b8d619e4f2854aad5 (BlockNo 10487730)
[db-sync-node:Info:65] [2024-06-24 15:11:42.62 UTC] Removing newer file /data/127612789-1bd207c15c-492.lstate
[db-sync-node:Info:65] [2024-06-24 15:11:42.82 UTC] Found snapshot file for genesis
[db-sync-node:Info:65] [2024-06-24 15:11:42.82 UTC] Setting ConsistencyLevel to DBAheadOfLedger
[db-sync-node:Info:65] [2024-06-24 15:11:59.59 UTC] Reached EpochNo 1
[db-sync-node:Info:74] [2024-06-24 15:11:59.60 UTC] Asynchronously wrote a ledger snapshot to /data/21599-3bd04916b6-0.lstate in 0.012668153s.
[db-sync-node:Info:65] [2024-06-24 15:12:13.72 UTC] Reached EpochNo 2
[db-sync-node:Info:74] [2024-06-24 15:12:13.75 UTC] Asynchronously wrote a ledger snapshot to /data/43199-e9684707f8-1.lstate in 0.029657086s.
[db-sync-node:Info:65] [2024-06-24 15:12:25.76 UTC] Reached EpochNo 3
[db-sync-node:Info:74] [2024-06-24 15:12:25.79 UTC] Asynchronously wrote a ledger snapshot to /data/64799-7cc07fd783-2.lstate in 0.030081998s.
[db-sync-node:Info:65] [2024-06-24 15:12:36.66 UTC] Reached EpochNo 4
[db-sync-node:Info:74] [2024-06-24 15:12:36.69 UTC] Asynchronously wrote a ledger snapshot to /data/86399-c793ac68d1-3.lstate in 0.030914305s.
[db-sync-node:Info:65] [2024-06-24 15:12:47.11 UTC] Reached EpochNo 5
kderme commented 1 week ago

This is not exactly a rollback to genesis, since to db data are deleted, only a replay of the ledger state. This should still not happen, the snapshot should be compatible with the db-sync version you're running. Also are there any logs behore that?

saravadeanil commented 1 week ago

@kderme Sorry, I don't have the logs now as I started to sync the db-sync from scratch. May I know how much average time it takes for db-sync to fully sync?

kderme commented 1 week ago

This very much depends on the hardware and ssd speed, but it takes days. It's better to let it replay the ledger after the snapshot restoration, which takes hours instead.

saravadeanil commented 1 day ago

Hi @kderme .

The cardano-db-sync component stopped syncing since last ~7-8h. Following are the logs from cardano-db-sync pod :-

Inserted epoch 493 from updateEpochWhenSyncing with Cache.
 epoch: Epoch {epochOutSum = 105294714922357004, epochFees = DbLovelace 141437820728, epochTxCount = 398226, epochBlkCount = 20982, epochNo = 493, epochStartTime = 2024-06-23 21:45:12 UTC, epochEndTime = 2024-06-28 21:44:33 UTC}
[db-sync-node:Info:74] [2024-07-01 19:48:16.93 UTC] Asynchronously wrote a ledger snapshot to /data/128044782-7e3a8db889-493.lstate in 154.410885704s.
[db-sync-node:Info:65] [2024-07-01 19:54:29.51 UTC] Inserted 1346982 EpochStake for EpochNo 495
[db-sync-node:Info:65] [2024-07-01 19:59:36.98 UTC] Insert Babbage Block: epoch 494, slot 128134137, block 10510000, hash b571c525cbb6d240a56686791285398d8bb6bb015b312c0bb5d0a81cc4d7b0ff
[db-sync-node:Info:65] [2024-07-01 19:59:36.98 UTC] Pool Offchain metadata fetch: 380 results, 120 fetch errors
[db-sync-node:Info:65] [2024-07-01 20:10:37.56 UTC] Insert Babbage Block: epoch 494, slot 128237399, block 10515000, hash c5b91d7e448aa0d184a3fbea035ef849d879e3d986fdf6e4d113431ca9d63ea9
[db-sync-node:Info:65] [2024-07-01 20:18:20.91 UTC] Pool Offchain metadata fetch: 227 results, 73 fetch errors
[db-sync-node:Info:65] [2024-07-01 20:22:27.53 UTC] The table epoch_stake was given a new unique constraint called unique_epoch_stake
[db-sync-node:Info:65] [2024-07-01 20:26:09.52 UTC] The table reward was given a new unique constraint called unique_reward
[db-sync-node:Info:65] [2024-07-01 20:26:09.53 UTC] Running database migrations in mode Indexes
[db-sync-node:Info:65] [2024-07-01 20:26:09.53 UTC] Found maintenance_work_mem=2GB, max_parallel_maintenance_workers=8
[db-sync-node:Warning:65] [2024-07-01 20:26:09.53 UTC] Creating Indexes. This may take a while. Setting a higher maintenance_work_mem from Postgres usually speeds up this process. These indexes are not used by db-sync but are meant for clients. If you want to skip some of these indexes, you can stop db-sync, delete or modify any migration-4-* files in the schema directory and restart it.

Following are my postgres configurations:-

psql -U postgres -c "SHOW ALL" | grep maintenance
 maintenance_io_concurrency             | 10                                         | A variant of effective_io_concurrency that is used for maintenance work.
 maintenance_work_mem                   | 6GB                                        | Sets the maximum memory to be used for maintenance operations.
 max_parallel_maintenance_workers       | 6                                          | Sets the maximum number of parallel processes per maintenance operation.

Node status so far:-

cardano-cli query tip --mainnet --socket-path /ipc/node.socket
{
    "block": 10519687,
    "epoch": 494,
    "era": "Babbage",
    "hash": "e3ba56862f724f8c4a401feec6b5db93ba5ee489ba86a7915a36e8b5fbc54363",
    "slot": 128331702,
    "slotInEpoch": 286902,
    "slotsToEpochEnd": 145098,
    "syncProgress": "100.00"
}
curl \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"query": "{ cardanoDbMeta { initialized syncPercentage }}"}' \
  http://localhost:3100/graphql

{"data":{"cardanoDbMeta":{"initialized":false,"syncPercentage":99.97}}}
curl \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"query": "{ cardano { tip { number slotNo epoch { number } } } }"}' http://localhost:3100/graphql
{"data":{"cardano":{"tip":{"number":10517963,"slotNo":"128297038","epoch":null}}}}

Do you have any suggestions how can I resolve it?

saravadeanil commented 1 day ago

So, now I skipped the files migration-4-* under schema directory and restarted the cardano-db-sync pod based on below log message and my node has catchup with latest block height and syncing.

[db-sync-node:Warning:65] [2024-07-02 08:27:52.38 UTC] Creating Indexes. This may take a while. Setting a higher maintenance_work_mem from Postgres usually speeds up this process. These indexes are not used by db-sync but are meant for clients. If you want to skip some of these indexes, you can stop db-sync, delete or modify any migration-4-* files in the schema directory and restart it.

Would there be any issue/implications because of this change if the db indexes are not present?