dashpay / dash

Dash - Reinventing Cryptocurrency
https://www.dash.org
MIT License
1.49k stars 1.2k forks source link

0.13.1.0 dashd crashes regularly masternodes #2695

Closed PanderMusubi closed 5 years ago

PanderMusubi commented 5 years ago

Describe the issue

0.13.1.0 dashd crashes masternodes every 16-23 minutes on Debian and 12-18 minutes on Ubuntu.

Can you reliably reproduce the issue?

Yes

If so, please list the steps to reproduce below:

  1. Existing masternodes that have been upgraded many times before
  2. Upgraded to 0.13.1.0
  3. Sentinel added
  4. remove chainstate and blocks
  5. Bootstrapped with Block 1018704: Sun Feb 10 00:00:01 UTC 2019
  6. Automatic start via crontab

Expected behaviour

dashd should not stop that quickly

Actual behaviour

dashd stops regularly

What version of Dash Core are you using?

0.13.1.0

Machine specs:

Any extra information that might be useful in the debugging process.

The output of tail -f debug.log

...
CMasternodeSync::ProcessTick -- nTick 685 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
AcceptConnection -- masternode is not synced yet, skipping inbound connection attempt
CMasternodeSync::ProcessTick -- nTick 691 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
receive version message: /Dash Core:0.13.0/: version 70213, blocks=1019165, us=83.96.xxx.yyy:35426, peer=12
ThreadSocketHandler -- removing node: peer=12 addr=199.247.xxx.yyy:9999 nRefCount=1 fInbound=0 fMasternode=0
CMasternodeSync::ProcessTick -- nTick 697 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
CMasternodeSync::ProcessTick -- nTick 703 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
CMasternodeSync::ProcessTick -- nTick 709 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
AcceptConnection -- masternode is not synced yet, skipping inbound connection attempt
CMasternodeSync::ProcessTick -- nTick 715 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
AcceptConnection -- masternode is not synced yet, skipping inbound connection attempt
CMasternodeSync::ProcessTick -- nTick 721 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
CMasternodeSync::ProcessTick -- nTick 727 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000
CMasternodeSync::ProcessTick -- nTick 733 nCurrentAsset 0 nTriedPeerCount 0 nSyncProgress -0.250000

This is output of tail -f sentinel-cron.log

...
dashd not synced with network! Awaiting full sync before running Sentinel.
-28: Verifying wallet...
Cannot connect to dashd. Please ensure dashd is running and the JSONRPC port is open to Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
-28: RPC server started
Cannot connect to dashd. Please ensure dashd is running and the JSONRPC port is open to Sentinel.
dashd not synced with network! Awaiting full sync before running Sentinel.
...

misc more details

"version": 130100,
"protocolversion": 70213,
"walletversion": 61000,
"blocks": 0,
"errors": ""
"AssetName": "MASTERNODE_SYNC_INITIAL",
"IsBlockchainSynced": false,
"IsMasternodeListSynced": false,
"IsWinnersListSynced": false,
"IsSynced": false,
"IsFailed": false
"service": "[::]:0",
"status": "Node just started, not yet activated"

Questions

  1. When I run dashd and it exists, there is no warning or error and the return value is 0
  2. What other files can I purge in order to get a clean start?
  3. What debugging can I use in order to investigate further?
  4. How do I do a completely clean start? Only keep wallet and config files?
UdjinM6 commented 5 years ago

What are your machine specs more specifically? How many RAM and also, do you have swap enabled? Is there enough space on HDD?

PanderMusubi commented 5 years ago

Debian 6 GB RAM and 14 GB swap with 70 GB free HD

Ubuntu 2 GB RAM and 2 GB swap with 10 GB free HD

Neither runs out of memory, both have been running Dash masternodes for different versions of Dash for a while, so they are up to the task. What debug logging should I run in order to determine cause of crash?

UdjinM6 commented 5 years ago

Hmmm... Not sure why it crashed but it looks like it didn't even complete the initial sync and in fact it was not really doing anything. Check connections in getinfo and getpeerinfo rpcs. Try adding some nodes from https://www.dashninja.pl/masternodes.html via addnode IPhere add (to add some fresh nodes) and removing netfulfilled.dat (to allow syncing from recent nodes again).

Also, try running it with debug=1 in dash.conf. This should spam a lot in debug.log but might give some insights about the crash.

PanderMusubi commented 5 years ago

Thanks. I have started both again from only config + wallet + bootstrap and they work now. Think you are right, that the bootstrap didn't complete and that it never could recover from that.

Remarkable that it happened on both independent machines and that they both resulted in an unstable situation that kept on crashing. Not sure how that state could be detected or handling of it improved.

Other minor this is the message

Not capable masternode: Invalid protocol version

that also gets reported by sentinel. All it takes then is start the master node from Qt client, but the message is a bit confusing at first.

If you think no further action is needed, you can close this issues as solved.

UdjinM6 commented 5 years ago

Glad you solved it! It's indeed very interesting that you were able to reproduce it, probably smth we should keep in mind (and maybe we should fix this in develop @codablock).

Anyway, thanks for reporting back! 👍 Closing.

PanderMusubi commented 5 years ago

No problem. I think it happened by restarting dashd mid bootstrap.