Chia-Network / chia-blockchain

Chia blockchain python implementation (full node, farmer, harvester, timelord, and wallet)
Apache License 2.0
10.82k stars 2.03k forks source link

[Bug] Chia node crashed #17124

Closed kicsiko closed 8 months ago

kicsiko commented 10 months ago

What happened?

I'm running the chia node in docker. The last few days the container chrashed more times. I attached the logs to see the problem

Version

2.1.3

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

chia  | 2023-12-20T21:22:04.655 full_node full_node_server        : ERROR    Exception: A process in the process pool was terminated abruptly while the future was running or pending., PeerInfo(_ip=IPv4Address('x.x.x.x'), _port=8444). Traceback (most recent call last):
chia  |   File "/chia-blockchain/chia/server/ws_connection.py", line 404, in wrapped_coroutine
chia  |     result: Message = await coroutine
chia  |   File "/chia-blockchain/chia/full_node/full_node_api.py", line 134, in new_peak
chia  |     await self.full_node.new_peak(request, peer)
chia  |   File "/chia-blockchain/chia/full_node/full_node.py", line 748, in new_peak
chia  |     if await self.short_sync_backtrack(
chia  |   File "/chia-blockchain/chia/full_node/full_node.py", line 675, in short_sync_backtrack
chia  |     await self.add_block(block, peer)
chia  |   File "/chia-blockchain/chia/full_node/full_node.py", line 1727, in add_block
chia  |     pre_validation_results = await self.blockchain.pre_validate_blocks_multiprocessing(
chia  |   File "/chia-blockchain/chia/consensus/blockchain.py", line 842, in pre_validate_blocks_multiprocessing
chia  |     return await pre_validate_blocks_multiprocessing(
chia  |   File "/chia-blockchain/chia/consensus/multiprocess_validation.py", line 380, in pre_validate_blocks_multiprocessing
chia  |     for batch_result in (await asyncio.gather(*futures))
chia  | concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

chia  | 2023-12-20T21:22:05.079 full_node full_node_server        : ERROR    Exception: Failed to fetch block 4685366 from PeerInfo(_ip=IPv4Address('x.x.x.x'), _port=8444), timed out, PeerInfo(_ip=IPv4Address('x.x.x.x'), _port=8444). Traceback (most recent call last):
chia  |   File "/chia-blockchain/chia/server/ws_connection.py", line 404, in wrapped_coroutine
chia  |     result: Message = await coroutine
chia  |   File "/chia-blockchain/chia/full_node/full_node_api.py", line 134, in new_peak
chia  |     await self.full_node.new_peak(request, peer)
chia  |   File "/chia-blockchain/chia/full_node/full_node.py", line 748, in new_peak
chia  |     if await self.short_sync_backtrack(
chia  |   File "/chia-blockchain/chia/full_node/full_node.py", line 663, in short_sync_backtrack
chia  |     raise ValueError(f"Failed to fetch block {curr_height} from {peer.get_peer_logging()}, timed out")
chia  | ValueError: Failed to fetch block 4685366 from PeerInfo(_ip=IPv4Address('x.x.x.x'), _port=8444), timed out
chia  | 
chia  | 2023-12-20T21:22:05.079 full_node full_node_server        : WARNING  Banning x.x.x.x for 10 seconds

chia  | 2023-12-20T21:22:59.288 Node healthcheck failed
chia  | 2023-12-20T21:23:59.387 Node healthcheck failed
chia  | 2023-12-20T21:24:59.524 Node healthcheck failed
wjblanke commented 9 months ago

A process in the process pool was terminated abruptly while the future was running or pending.

Ya thats really bad. It means the python process was killed. Do you have any other error reporting/crash dump from this docker container? Any system logs? dmesg? journalctl? Is this using the official docker image? Or can you link the one your are using? We are thinking maybe OOM issue?

kicsiko commented 9 months ago

image: ghcr.io/chia-network/chia:latest (2.1.3) os: Debian 12.4 (cli only) docker engine: 5:24.0.7-1 ~debian.12 ~bookworm (official repo) system: Intel Celeron J3455 & AMD Ryzen 5 PRO 4650G (same problem with both pc) At the moment I have an other problem. One of the crashed the database corrupted so I need resync (It is really slow on the dust storm). Can we use the function to resync from a set of block (like mmx-node)?

kicsiko commented 9 months ago

I delete the all .chia folder and I got this error:

full_node chia.full_node.full_node: WARNING querying DNS introducer failed: All nameservers failed to answer the query seeder.xchpool.org. IN A: Server Do53:127.0.0.11@53 answered The DNS operation timed out after 2.000 seconds; Server Do53:127.0.0.11@53 answered The DNS operation timed out after 2.000 seconds; Server Do53:127.0.0.11@53 answered SERVFAIL

kicsiko commented 9 months ago

full_node chia.full_node.full_node: ERROR Error with syncing: <class 'sqlite3.DatabaseError'>Traceback (most recent call last): chia | File "/chia-blockchain/chia/full_node/full_node.py", line 979, in _sync chia | await self.blockchain.warmup(fork_point) chia | File "/chia-blockchain/chia/consensus/blockchain.py", line 917, in warmup chia | block_records = await self.block_store.get_block_records_in_range( chia | File "/chia-blockchain/chia/full_node/block_store.py", line 445, in get_block_records_in_range chia | async with conn.execute( chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/context.py", line 39, in aenter chia | self._obj = await self._coro chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 190, in execute chia | cursor = await self._execute(self._conn.execute, sql, parameters) chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 133, in _execute chia | return await future chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 106, in run chia | result = function() chia | sqlite3.DatabaseError: database disk image is malformed

kicsiko commented 9 months ago

other problem: docker exec -it chia venv/bin/chia db backup --backup_file /backup/blockchain_v2_mainnet.sqlite.231028.bak reading from blockchain database: /root/.chia/mainnet/db/blockchain_v2_mainnet.sqlite writing to backup file: /backup/blockchain_v2_mainnet.sqlite.231028.bak FAILED: backup failed with error: 'database or disk is full' Your backup file /backup/blockchain_v2_mainnet.sqlite.231028.bak is probably left over in an insconsistent state.

out of the docker: sqlite3 data/mainnet/db/blockchain_v2_mainnet.sqlite "vacuum into '/mnt/backup/chia/database/blockchain_v2_mainnet.sqlite.231028.bak'" Error: stepping, database or disk is full (13)

I have 1.2 TB free space on the drive.

kicsiko commented 9 months ago

service: node (node only)

chia | 2024-01-06T00:40:50.608 full_node full_node_server : DEBUG <- new_transaction from peer 96a3b857012d2933617d7f89ab5627193883b4ce425fdaa43e20db9649717130 79.139.164.85 chia | 2024-01-06T00:40:50.608 full_node full_node_server : DEBUG Time taken to process new_transaction from c11a62031d1802ec122614ae8621e053a21f636debb82e557a76a9eb4c33af19 is 0.0012068748474121094 seconds chia | 2024-01-06T00:40:50.609 full_node full_node_server : DEBUG Time taken to process new_transaction from 96a3b857012d2933617d7f89ab5627193883b4ce425fdaa43e20db9649717130 is 0.00042176246643066406 seconds chia | 2024-01-06T00:40:50.609 full_node chia.consensus.block_body_validation: DEBUG Cost: 4279896540 max: 11000000000 percent full: 38.91% chia | 2024-01-06T00:40:50.633 full_node full_node_server : DEBUG <- new_transaction from peer 28f411706f0097732110a30b2a676e40e7a4a525951412096a5ded299ee62a7a 118.123.16.49 chia | 2024-01-06T00:40:50.634 full_node full_node_server : DEBUG Time taken to process new_transaction from 28f411706f0097732110a30b2a676e40e7a4a525951412096a5ded299ee62a7a is 0.0005681514739990234 seconds chia | 2024-01-06T00:40:50.680 full_node full_node_server : DEBUG <- new_transaction from peer 9247223a06ad76a11c63fc73fc990dd9de33db5208dacb37e0dfe183c4023126 190.51.7.252 chia | 2024-01-06T00:40:50.680 full_node full_node_server : DEBUG <- new_transaction from peer 2525ed13682937215b7d36919b92758e85b9aa844ce6a353ef9803022d7fff50 195.228.233.208 chia | 2024-01-06T00:40:50.680 full_node full_node_server : DEBUG <- new_transaction from peer c11a62031d1802ec122614ae8621e053a21f636debb82e557a76a9eb4c33af19 75.236.235.6 chia | 2024-01-06T00:40:50.681 full_node full_node_server : DEBUG <- new_transaction from peer 2fd9ccee172756deecc9c1cb9f349942b1de219e20842f740b32951754ea167a 190.7.28.114 chia | 2024-01-06T00:40:50.681 full_node full_node_server : DEBUG <- new_transaction from peer 224ba102882d1bde31062ba3907bafcd06dc861d2a08da0463d6164ba269c447 47.184.122.143 chia | 2024-01-06T00:40:50.681 full_node full_node_server : DEBUG Time taken to process new_transaction from 9247223a06ad76a11c63fc73fc990dd9de33db5208dacb37e0dfe183c4023126 is 0.0018465518951416016 seconds chia | 2024-01-06T00:40:50.682 full_node full_node_server : DEBUG Time taken to process new_transaction from 2525ed13682937215b7d36919b92758e85b9aa844ce6a353ef9803022d7fff50 is 0.0015349388122558594 seconds chia | 2024-01-06T00:40:50.682 full_node full_node_server : DEBUG Time taken to process new_transaction from c11a62031d1802ec122614ae8621e053a21f636debb82e557a76a9eb4c33af19 is 0.001432657241821289 seconds chia | 2024-01-06T00:40:50.682 full_node full_node_server : DEBUG Time taken to process new_transaction from 2fd9ccee172756deecc9c1cb9f349942b1de219e20842f740b32951754ea167a is 0.001341104507446289 seconds chia | 2024-01-06T00:40:50.682 full_node full_node_server : DEBUG Time taken to process new_transaction from 224ba102882d1bde31062ba3907bafcd06dc861d2a08da0463d6164ba269c447 is 0.0012969970703125 seconds chia | 2024-01-06T00:41:14.642 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:41:44.659 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:42:14.690 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:42:44.696 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:43:14.716 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:43:44.747 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:44:14.773 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:44:44.804 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:45:14.819 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:45:44.850 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:46:14.856 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:46:30.722 Node healthcheck failed chia | 2024-01-06T00:46:44.887 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:47:14.918 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:47:40.818 Node healthcheck failed chia | 2024-01-06T00:47:44.945 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:48:14.976 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:48:44.992 daemon chia.daemon.server : DEBUG About to ping: chia_full_node chia | 2024-01-06T00:48:44.998 daemon chia.daemon.server : WARNING Ping error to chia_full_node, closing connection.: ConnectionResetError: Cannot write to closing transport chia | 2024-01-06T00:48:50.905 Node healthcheck failed chia | 2024-01-06T00:50:01.007 Node healthcheck failed chia | 2024-01-06T00:51:11.061 Node healthcheck failed chia | 2024-01-06T00:52:21.165 Node healthcheck failed chia | 2024-01-06T00:53:31.264 Node healthcheck failed

kicsiko commented 9 months ago

chia | 2024-01-06T16:52:01.255 full_node chia.server.address_manager: DEBUG address_manager.select_peer took 5.17e-05 seconds in new table. chia | 2024-01-06T16:52:01.256 full_node chia.full_node.full_node: DEBUG Addrman selected address: PeerInfo(_ip=IPv4Address('108.65.121.175'), _port=8444). chia | 2024-01-06T16:52:01.256 full_node chia.full_node.full_node: DEBUG Num peers needed: 6 chia | 2024-01-06T16:52:01.256 full_node full_node_server : DEBUG Connecting: wss://108.65.121.175:8444/ws, Peer info: PeerInfo(_ip=IPv4Address('108.65.121.175'), _port=8444) chia | 2024-01-06T16:52:01.353 full_node chia.full_node.weight_proof: DEBUG check db for sub epoch 4672 chia | 2024-01-06T16:52:02.257 full_node chia.full_node.full_node: DEBUG Address manager query count: 0. Query limit: 10 chia | 2024-01-06T16:52:02.549 full_node chia.server.chia_policy : DEBUG Connection lost. Total connections: 0 chia | 2024-01-06T16:52:02.549 full_node full_node_server : WARNING Banning 90.188.243.194 for 10 seconds chia | 2024-01-06T16:52:02.550 full_node full_node_server : INFO Connection closed: 90.188.243.194, node id: 44735b7e0bc377a155962032ff707afb3a5c75f7408f8f335e5996f955e38cd8 chia | 2024-01-06T16:52:02.550 full_node full_node_server : DEBUG Invalid connection type for connection 90.188.243.194, while closing. Handshake never finished. chia | 2024-01-06T16:52:02.550 full_node chia.full_node.full_node: INFO peer disconnected PeerInfo(_ip=IPv4Address('90.188.243.194'), _port=2831) chia | 2024-01-06T16:52:02.551 full_node full_node_server : WARNING Invalid handshake with peer. Maybe the peer is running old software. chia | 2024-01-06T16:52:02.552 full_node full_node : INFO Waiting for RPC server chia | 2024-01-06T16:52:02.552 daemon chia.daemon.server : DEBUG Received message: WSMessage(type=<WSMsgType.TEXT: 1>, data='{"ack": false, "command": "get_connections", "data": {"connections": [{"bytes_read": 61523, "bytes_written": 590, "creation_time": 1704559899.150454, "last_message_time": 1704559920.9599998, "local_port": 8444, "node_id": "0x331ae65fcc0719b8d8198bb80b7272013b8451458679f6232659f750432021fa", "peak_hash": "0x9b17190ea2e07c5c87ed20477278f95678fa8ff8d75bc4d109ec08354136413f", "peak_height": 4763352, "peak_weight": 12961508304, "peer_host": "99.184.65.94", "peer_port": 8444, "peer_server_port": 8444, "type": 1}, {"bytes_read": 29495, "bytes_written": 164, "creation_time": 1704559900.3856335, "last_message_time": 1704559900.866632, "local_port": 8444, "node_id": "0x8bf653c1e947753b206dc920477feb6376148310b475a4f640a2e170447f70a8", "peak_hash": "0x909321a6c0f5914a8781e79296b6d03bc897cdefc344486f3d9d5875e86b8c91", "peak_height": 4755435, "peak_weight": 12874101840, "peer_host": "162.239.117.102", "peer_port": 8444, "peer_server_port": 8444, "type": 1}], "success": true}, "destination": "wallet_ui", "origin": "chia_full_node", "request_id": "bb6559c39ca4963297b94a59c64c879af02e51fdc7dce251e56d7331080fc5d4"}', extra='') chia | 2024-01-06T16:52:02.559 daemon chia.daemon.server : DEBUG Received message: WSMessage(type=<WSMsgType.CLOSE: 8>, data=1000, extra='') chia | 2024-01-06T16:52:02.562 daemon chia.daemon.server : INFO Connection close requested. Closing websocket with ['chia_full_node']. chia | 2024-01-06T16:52:02.758 full_node chia.server.address_manager: DEBUG address_manager.select_peer took 5.46e-05 seconds in new table. chia | 2024-01-06T16:52:02.759 full_node chia.full_node.full_node: DEBUG Addrman selected address: PeerInfo(_ip=IPv4Address('178.197.206.22'), _port=8444). chia | 2024-01-06T16:52:02.759 full_node chia.full_node.full_node: DEBUG Num peers needed: 6 chia | 2024-01-06T16:52:02.760 full_node full_node_server : DEBUG Connecting: wss://178.197.206.22:8444/ws, Peer info: PeerInfo(_ip=IPv4Address('178.197.206.22'), _port=8444) chia | 2024-01-06T16:52:03.414 full_node chia.full_node.weight_proof: DEBUG check db for sub epoch 4673 chia | 2024-01-06T16:52:03.760 full_node chia.full_node.full_node: DEBUG Address manager query count: 0. Query limit: 10 chia | 2024-01-06T16:52:04.262 full_node chia.server.address_manager: DEBUG address_manager.select_peer took 5.56e-05 seconds in new table. chia | 2024-01-06T16:52:04.262 full_node chia.full_node.full_node: DEBUG Addrman selected address: PeerInfo(_ip=IPv4Address('101.109.4.230'), _port=8444). chia | 2024-01-06T16:52:04.262 full_node chia.full_node.full_node: DEBUG Num peers needed: 6 chia | 2024-01-06T16:52:04.263 full_node full_node_server : DEBUG Connecting: wss://101.109.4.230:8444/ws, Peer info: PeerInfo(_ip=IPv4Address('101.109.4.230'), _port=8444) chia | 2024-01-06T16:52:04.559 full_node full_node : INFO Closed RPC server chia | 2024-01-06T16:52:04.559 full_node full_node : INFO Service full_node at port 8444 fully stopped chia | 2024-01-06T16:56:45.986 Node healthcheck failed chia | 2024-01-06T16:57:56.085 Node healthcheck failed

kicsiko commented 9 months ago

opening file for reading: /root/.chia/mainnet/db/blockchain_v2_mainnet.sqlite peak hash: 9b6fa15374a54aa9440d193dba5b51c60bc73bee8bf735cc3178c2647fc59910 peak height: 4754702 traversing the full chain 0 orphaned blocks: 5828
5828 orphaned blocks DATABASE IS VALID: /root/.chia/mainnet/db/blockchain_v2_mainnet.sqlite

chia | 2024-01-07T02:37:58.517 full_node chia.full_node.full_node: INFO Start syncing from fork point at 4753567 up to 4765252 chia | 2024-01-07T02:38:02.041 Healthcheck(s) completed successfully chia | 2024-01-07T02:38:03.121 full_node chia.consensus.blockchain: ERROR Error while adding block 44f265292bc04c56d542f67f85c3129a7fb7643a15c972bf6516535e0b419c99 height 4754703, rolling back: Traceback (most recent call last): chia | File "/chia-blockchain/chia/consensus/blockchain.py", line 461, in add_block chia | await self.block_store.add_full_block(header_hash, block, block_record) chia | File "/chia-blockchain/chia/full_node/block_store.py", line 142, in add_full_block chia | await conn.execute( chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 190, in execute chia | cursor = await self._execute(self._conn.execute, sql, parameters) chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 133, in _execute chia | return await future chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 106, in run chia | result = function() chia | sqlite3.DatabaseError: database disk image is malformed chia | database disk image is malformed chia | 2024-01-07T02:38:03.121 full_node chia.full_node.full_node: ERROR sync from fork point failed: DatabaseError: database disk image is malformed chia | Traceback (most recent call last): chia | File "/chia-blockchain/chia/util/log_exceptions.py", line 20, in log_exceptions chia | yield chia | File "/chia-blockchain/chia/full_node/full_node.py", line 1173, in sync_from_fork_point chia | await asyncio.gather(fetch_task, validate_task) chia | File "/chia-blockchain/chia/full_node/full_node.py", line 1135, in validate_block_batches chia | success, state_change_summary, err = await self.add_block_batch( chia | File "/chia-blockchain/chia/full_node/full_node.py", line 1307, in add_block_batch chia | result, error, state_change_summary = await self.blockchain.add_block( chia | File "/chia-blockchain/chia/consensus/blockchain.py", line 461, in add_block chia | await self.block_store.add_full_block(header_hash, block, block_record) chia | File "/chia-blockchain/chia/full_node/block_store.py", line 142, in add_full_block chia | await conn.execute( chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 190, in execute chia | cursor = await self._execute(self._conn.execute, sql, parameters) chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 133, in _execute chia | return await future chia | File "/chia-blockchain/venv/lib/python3.9/site-packages/aiosqlite/core.py", line 106, in run chia | result = function() chia | sqlite3.DatabaseError: database disk image is malformed

github-actions[bot] commented 9 months ago

This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.

wjblanke commented 8 months ago

chia | sqlite3.DatabaseError: database disk image is malformed

Once this happens in SQLite there isn't anything chia can do. Best thing is to redownload the DB from scratch and configure docker to have a bigger drive.

emlowe commented 8 months ago

Closing