HathorNetwork / hathor-core

Hathor core is the official and reference client for operating a full node in Hathor Network.
https://hathor.network
Apache License 2.0
83 stars 26 forks source link

Database corruption when stopping node during sync with cache enabled #487

Closed luislhl closed 4 weeks ago

luislhl commented 2 years ago

I'm getting an error when testing hathor-core locally.

Basically, when using cache, if I start syncing it without a snapshot then stop the process with a Ctrl + C, when starting it again with the same command I would get this error:

2022-09-12 22:05:17 [info     ] [hathor.cli.run_node] hathor-core v0.50.2            genesis=3fdff62 hathor=0.50.2 my_peer_id=<redacted> pid=55649 platform=Linux-5.15.0-47-generic-x86_64-with-glibc2.35 python=3.9.13-CPython settings=/home/luislhl/Workspace/hathor/hathor-core/hathor/conf/mainnet.py
2022-09-12 22:05:17 [info     ] [hathor.cli.run_node] with storage                   path=./data storage_class=TransactionRocksDBStorage
2022-09-12 22:05:17 [info     ] [hathor.cli.run_node] with cache                     capacity=1000 interval=5
2022-09-12 22:05:17 [info     ] [hathor.cli.run_node] with indexes                   indexes_class=RocksDBIndexesManager
2022-09-12 22:05:17 [info     ] [hathor.manager] start manager                  network=mainnet
2022-09-12 22:05:17 [info     ] [hathor.p2p.manager] update whitelist
2022-09-12 22:05:17 [info     ] [hathor.manager] initialize
2022-09-12 22:05:17 [error    ] [hathor.cli.main] Uncaught exception:            
Traceback (most recent call last):
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/cli/main.py", line 157, in main
    sys.exit(CliManager().execute_from_command_line())
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/cli/main.py", line 152, in execute_from_command_line
    module.main()
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/cli/run_node.py", line 703, in main
    RunNode().run()
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/cli/run_node.py", line 692, in __init__
    self.prepare(args)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/cli/run_node.py", line 326, in prepare
    self.start_manager(args)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/cli/run_node.py", line 352, in start_manager
    self.manager.start()
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/manager.py", line 271, in start
    self._initialize_components_new()
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/manager.py", line 576, in _initialize_components_new
    self.tx_storage.indexes._manually_initialize(self.tx_storage)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/indexes/manager.py", line 175, in _manually_initialize
    for tx in progress(tx_storage.topological_iterator(), log=self.log, total=total):
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/util.py", line 447, in progress
    yield from _progress(iter_tx, log=log, total=total)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/util.py", line 468, in _progress
    tx: 'BaseTransaction' = next(iter_tx)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/transaction/storage/transaction_storage.py", line 931, in _topological_sort_timestamp_index
    tx = self.get_transaction(tx_hash)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/transaction/storage/transaction_storage.py", line 345, in get_transaction
    tx = self._get_transaction(hash_bytes)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/transaction/storage/cache_storage.py", line 193, in _get_transaction
    tx = self.store.get_transaction(hash_bytes)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/transaction/storage/transaction_storage.py", line 345, in get_transaction
    tx = self._get_transaction(hash_bytes)
  File "/home/luislhl/Workspace/hathor/hathor-core/hathor/transaction/storage/rocksdb_storage.py", line 106, in _get_transaction
    raise TransactionDoesNotExist(hash_bytes.hex())
hathor.transaction.storage.exceptions.TransactionDoesNotExist: 0000000008bfd52ed4756fc2204d89f3dd3499b4df62113f6adc4949a9ba714c

The command I was using:

poetry run hathor-cli run_node --status 8080 --data ./data --wallet-index --x-rocksdb-indexes --peer ./data/peer_id.json --prometheus --cache --cache-size 1000

I thought this could be related to the cache not flushing data to the database when the node is stopped, so I tested a fix for this that apparently works well. I stopped getting the error after the fix:

diff --git a/hathor/cli/run_node.py b/hathor/cli/run_node.py
index 4783131..852c6b1 100644
--- a/hathor/cli/run_node.py
+++ b/hathor/cli/run_node.py
@@ -24,6 +24,8 @@ from autobahn.twisted.resource import WebSocketResource
 from structlog import get_logger
 from twisted.web.resource import Resource

+from hathor.transaction.storage.cache_storage import TransactionCacheStorage
+
 logger = get_logger()
 # LOGGING_CAPTURE_STDOUT = True

diff --git a/hathor/manager.py b/hathor/manager.py
index d912620..87fcad5 100644
--- a/hathor/manager.py
+++ b/hathor/manager.py
@@ -36,7 +36,7 @@ from hathor.profiler import get_cpu_profiler
 from hathor.pubsub import HathorEvents, PubSubManager
 from hathor.transaction import BaseTransaction, Block, MergeMinedBlock, Transaction, TxVersion, sum_weights
 from hathor.transaction.exceptions import TxValidationError
-from hathor.transaction.storage import TransactionStorage
+from hathor.transaction.storage import TransactionStorage, TransactionCacheStorage
 from hathor.transaction.storage.exceptions import TransactionDoesNotExist
 from hathor.util import LogDuration, Random, Reactor
 from hathor.wallet import BaseWallet
@@ -314,6 +314,10 @@ class HathorManager:
             wait_stratum = self.stratum_factory.stop()
             if wait_stratum:
                 waits.append(wait_stratum)
+
+        if isinstance(self.tx_storage, TransactionCacheStorage):
+            self.tx_storage.flush()
+
         return defer.DeferredList(waits)

     def do_discovery(self) -> None:
diff --git a/hathor/transaction/storage/cache_storage.py b/hathor/transaction/storage/cache_storage.py
index 8c9c52f..b888dc5 100644
--- a/hathor/transaction/storage/cache_storage.py
+++ b/hathor/transaction/storage/cache_storage.py
@@ -205,6 +205,9 @@ class TransactionCacheStorage(BaseTransactionStorage):
             self._save_to_weakref(tx)
             yield tx

+    def flush(self):
+        self._flush_to_storage(self.dirty_txs.copy())
+
     def get_count_tx_blocks(self) -> int:
         self._flush_to_storage(self.dirty_txs.copy())
         return self.store.get_count_tx_blocks()
jansegre commented 4 weeks ago

This changes have been merged already.