[Bug] Node does not update synchronisation target while running

ast0815 commented 2 years ago

What happened?

My node no longer updates the block chain height while running. I.e. its status is forever stuck at e.g. "Current Blockchain Status: Not Synced. Peak height: 1104575". The only way for me to make it update this, is by restarting the node. It then sets the chain height to the current value and starts syncing up to that point. Then it is stuck again.

I'm not sure if it is related, but I had to re synchronise from scratch recently, and my farming pool went offline during that period. So right now I would be farming with an invalid pool. I tried to switch to self pooling, but it seems like this transaction will not be considered until I make my node be in sync with the chain somehow.

I have posted some hopefully relevant lines of the log. These kinds of errors/warnings just repeat over and over again.

Any help is appreciated.

Version

1.2.11

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

2021-11-06T22:06:45.756 farmer chia.farmer.farmer         : ERROR    Exception in GET /pool_info https://farm.poolharvest.io, Cannot connect to host farm.poolharvest.io:443 ssl:<ssl.SSLContext object at 0xffff90764f40> [Connect call failed ('147.135.129.11', 443)]
2021-11-06T22:06:45.757 farmer chia.farmer.farmer         : WARNING  No pool specific authentication_token_timeout has been set for 502b749031f10eb1c3d659842d55b720f086692aaabe7c1761caae849cb62bca, check communication with the pool.
2021-11-06T22:07:20.599 wallet chia.wallet.wallet_node    : WARNING  SpendBundle has been rejected by the FullNode. {'error': 'NO_TRANSACTIONS_WHILE_SYNCING',
 'status': 3,
 'txid': '0xceb4d1358d81dbf4dced21afc60e9b805ab180557a71c9b6e13ab66db60af688'}
2021-11-06T22:07:22.463 wallet chia.wallet.wallet_node    : WARNING  SpendBundle has been rejected by the FullNode. {'error': 'NO_TRANSACTIONS_WHILE_SYNCING',
 'status': 3,
 'txid': '0xceb4d1358d81dbf4dced21afc60e9b805ab180557a71c9b6e13ab66db60af688'}
2021-11-06T22:07:24.280 wallet chia.wallet.wallet_node    : WARNING  SpendBundle has been rejected by the FullNode. {'error': 'NO_TRANSACTIONS_WHILE_SYNCING',
 'status': 3,
 'txid': '0xceb4d1358d81dbf4dced21afc60e9b805ab180557a71c9b6e13ab66db60af688'}
2021-11-06T22:07:25.032 wallet chia.wallet.wallet_node    : WARNING  SpendBundle has been rejected by the FullNode. {'error': 'NO_TRANSACTIONS_WHILE_SYNCING',
 'status': 3,
 'txid': '0xceb4d1358d81dbf4dced21afc60e9b805ab180557a71c9b6e13ab66db60af688'}
2021-11-06T22:07:26.114 wallet chia.wallet.wallet_node    : WARNING  SpendBundle has been rejected by the FullNode. {'error': 'NO_TRANSACTIONS_WHILE_SYNCING',
 'status': 3,
 'txid': '0xceb4d1358d81dbf4dced21afc60e9b805ab180557a71c9b6e13ab66db60af688'}
2021-11-06T22:07:47.199 farmer chia.farmer.farmer         : ERROR    Exception in GET /pool_info https://farm.poolharvest.io, Cannot connect to host farm.poolharvest.io:443 ssl:<ssl.SSLContext object at 0xffff90764ec0> [Connect call failed ('147.135.129.11', 443)]
2021-11-06T22:07:47.201 farmer chia.farmer.farmer         : WARNING  No pool specific authentication_token_timeout has been set for 502b749031f10eb1c3d659842d55b720f086692aaabe7c1761caae849cb62bca, check communication with the pool.

...

2021-11-07T02:01:37.832 full_node chia.full_node.full_node: WARNING  Block pre-validation time: 10.25 seconds
2021-11-07T02:02:02.459 full_node chia.full_node.coin_store: WARNING  It took 21.79s to apply 774 additions and 24 removals to the coin store. Make sure blockchain databa

loppefaaret commented 2 years ago

first hings first, get the full_node in sync - so anything not from the full_node (and possibly "daemon") in the log, is not of any value right now - any chance you could upload the entire debug.log ? possibly restart chia first, and let it run for 15 minutes ? just so we get that "fresh" look in the log - if there is any thing popping up in the first few minutes

ast0815 commented 2 years ago

I did that and found no other lines coming from the node itself other than the ones quoted above. But I will try again and provide the whole log. Maybe I missed something.

ast0815 commented 2 years ago

Ok, I stopped everything and started only the node. For some reason it was then able to sync and is staying synced so far. Even after starting the farmer and wallet again, it looks to be working now. The transaction to switch to self pooling also seems to be accepted now.

This is weird and I assume there is still a bug to be found somewhere, but my problem seems to be solved for now.

ast0815 commented 2 years ago

I still get the same problem that every now and then my node just stops updating the synchronisation target. Here is my latest log file:

debug.log

It has been stuck in the current state since roughly 21:24. It is now 23:43.

ast0815 commented 2 years ago

According to my separate logs of the farm status, it first lost sync around 02:27:

2021-11-22T02:16:34.180 full_node chia.full_node.coin_store: WARNING  It took 14.30s to apply 733 additions and 69 removals to the coin store. Make sure blockchain database is on a fast drive
2021-11-22T02:16:47.808 full_node chia.full_node.full_node: WARNING  Block validation time: 29.40 seconds, pre_validation time: 0.98 seconds, cost: 2191189935, percent full: 19.92%
2021-11-22T02:27:37.847 full_node chia.full_node.full_node: ERROR    got weight proof request for unknown peak 76227384d4ce40e7f83cfef38e5fa92265693262e7e87b83573e2b469c4b8f17
2021-11-22T04:19:43.014 full_node chia.full_node.full_node: ERROR    Respond peers exception: 'NoneType' object has no attribute 'host'. Traceback: Traceback (most recent call last):
  File "/home/chia/chia_env/lib/python3.8/site-packages/chia/server/node_discovery.py", line 610, in respond_peers
    await self.add_peers_neighbour(request.peer_list, peer_src)
  File "/home/chia/chia_env/lib/python3.8/site-packages/chia/server/node_discovery.py", line 575, in add_peers_neighbour
    neighbour_data = (neighbour_info.host, neighbour_info.port)
AttributeError: 'NoneType' object has no attribute 'host'

2021-11-22T04:19:43.016 full_node chia.full_node.full_node: ERROR    Respond peers exception: 'NoneType' object has no attribute 'host'. Traceback: Traceback (most recent call last):
  File "/home/chia/chia_env/lib/python3.8/site-packages/chia/server/node_discovery.py", line 610, in respond_peers
    await self.add_peers_neighbour(request.peer_list, peer_src)
  File "/home/chia/chia_env/lib/python3.8/site-packages/chia/server/node_discovery.py", line 575, in add_peers_neighbour
    neighbour_data = (neighbour_info.host, neighbour_info.port)
AttributeError: 'NoneType' object has no attribute 'host'

2021-11-22T04:19:43.018 full_node chia.full_node.full_node: ERROR    Respond peers exception: 'NoneType' object has no attribute 'host'. Traceback: Traceback (most recent call last):
  File "/home/chia/chia_env/lib/python3.8/site-packages/chia/server/node_discovery.py", line 610, in respond_peers
    await self.add_peers_neighbour(request.peer_list, peer_src)
  File "/home/chia/chia_env/lib/python3.8/site-packages/chia/server/node_discovery.py", line 575, in add_peers_neighbour
    neighbour_data = (neighbour_info.host, neighbour_info.port)
AttributeError: 'NoneType' object has no attribute 'host'

I guess the errors around 04:19 must either be a consequence of whatever went wrong, or unrelated.

loppefaaret commented 2 years ago

the above debug.log - i cant find want you are describing going back a couple of days - there is nothing noted around 21:24 perhaps if you switch on INFO level logging, we might have some useful entries in your debug.log - you can either switch it from CLI with chia configure --log-level INFO or by manually editing your ~\.chia\mainnet\config\config.yaml and finding the line log_level = WARNING and adjusting it to log_level = INFO - save the file, and restart Chia

i do notice that you got some warnings about long block times - which could indicate that you are on a slower storage media ? you might want to consider getting your database onto a faster drive this can also be done by manually editing the config.yaml file, under the full_node: section, editing the database_path: to include the full path to its new location, closing down Chia, moving the database file, and open Chia up again

ast0815 commented 2 years ago

The DB is on an external HDD, connected via USB 3. I have noticed that I get a few "too late" partials on my pool, but it is not so bad overall.

I have switched the debug level to INFO, and after restarting the node for an unrelated reason today (the disk with the DB had a hickup), this is the log: debug.log

After restarting this afternoon, it synced up to peak height 1199808 and then just sat there. I have restarted the node again now, since repeated restarts seem to be the only thing that makes it sync.

loppefaaret commented 2 years ago

it does seems filled with potential issues with the connection to the blockchain database from what i can gather. plenty of IO errors reported, and it seems like the node is having a hard time feeding the wallet requests for initial weight proofs and sync requests - if you got the option of moving the database to a drive that is either internal, or perhaps even an SSD based device, you might see improved perfomance ?

ast0815 commented 2 years ago

I am not worried about the performance overall. But the fact that it just completely stops syncing. I am aware that my RasPi setup will not be super high performance, but this total stop seems like it should not happen. I very much prefer not having to buy an SSD just for the sake of storing the database.

Here is another log: debug.log

It just stopped syncing around 6:30 AM this morning, probably somewhere around line 11900 in the log. After around that time the log is just full of lines reading

2021-12-03T06:27:59.631 full_node chia.full_node.full_node_store: INFO     Don't have challenge hash d8682f550902a7f08e61aa3cbe05625e974cdc1455cb5882a9cff848b5247eb3, caching EOS
2021-12-03T06:27:59.632 full_node chia.full_node.full_node: INFO     End of slot not added CC challenge af9afc20899b28b2dffc4e5bdcf65352b5b5a4ccb9de429731e1bc6a8aa1610a

which repeats many times per second for a few minutes.

I have now implemented a script that automatically restarts my node when it stops syncing, but I would still prefer somehow understanding this issue and somehow fix it.

Thanks for your time, by the way. I appreciate it.

Chia-Network / chia-blockchain