Concordium / concordium-node

The main concordium node implementation.
GNU Affero General Public License v3.0
45 stars 22 forks source link

Node doesn't appear on the dashboard consistently after being restarted #258

Closed mh-concordium closed 2 years ago

mh-concordium commented 2 years ago

Bug Description It seems that node doesn't consistently appear on the Dashboard after being restarted. This issue was seen on Mac and Windows.

Deleting database files and restarting the node seems to be the workaround. Node appears on the dashboard after database files were deleted and node restarted.

Steps to Reproduce

  1. Install and configure the node.
  2. Set the log level to debug.
  3. Node catches up, verify that node is visible on the dashboard.
  4. Restart the node.
  5. Observe the dashboard and logs.

Expected Result After the node has started it should appear on the Dashboard.

Actual Result Node disappears from the Dashboard. Issue really appears inconsistently, sometimes node appears on the Dashboard after 10-20 mins and sometimes not. However if database files are deleted then it just appears on the Dashboard right after it starts (within minutes).

Logs are flooded with following line gRPC failed with “transport error: error trying to connect: tcp connect error: Connection refused (os error 61)” for http://localhost:10002/, sleeping for 5000 ms However, when this log is not visible anymore, node seems to be working ok but its still not visible on the Dashboard.

Logs: logs.txt

Versions

abizjak commented 2 years ago

This is almost certainly because the node takes very long to start, at which point the collector gives up already, so your node does not appear on the dashboard.

For the windows node, there is a workaround, setting the start delay

collector.env.CONCORDIUM_NODE_COLLECTOR_ARTIFICIAL_START_DELAY = = "360000"

(the value is in ms, so you might have to increase it if it takes longer for your node to start)

See also https://github.com/Concordium/concordium-node/issues/244 which will address this issue as well once it's fixed.

The mac node should have a similar workaround, by setting the value of the CONCORDIUM_NODE_COLLECTOR_ARTIFICIAL_START_DELAY variable.

abizjak commented 2 years ago

Can you try this out @mh-concordium and confirm it is the same issue?

mh-concordium commented 2 years ago

Yes, I will look at it. Thanks.

mh-concordium commented 2 years ago

I have added that to configuration (it seems that it works with one "=" sign) and restarted the node. I can see that the node is responsive and it works. Command such as raw GetNodeInfo returns the info but node doesn't appear on the Dashboard now after 45 minutes.

abizjak commented 2 years ago

Ok, and what is your configuration file? Is the dashboard URL correct?

Just to be clear, you are trying to use the stagenet node right?

mh-concordium commented 2 years ago

Yes, stagenet. I didn't change URL, just pasted configuration from our wiki.

[node.stagenet] enabled = true name = "Stagenet node" bootstrap_nodes = "bootstrap.stagenet.concordium.com:8888" config_dir = 'stagenet\config' data_dir = 'stagenet\data' listen.port = 8987 listen.ip = "0.0.0.0" rpc.port = 10002 rpc.ip = "127.0.0.1" log.level = "debug" log.path = 'stagenet\logs\stagenet.log' log.roll.size = '50mb' log.roll.count = 2 collector.url = 'https://dashboard.stagenet.concordium.com/nodes/post' collector.enabled = true collector.node_name = '''mh-Win'''

abizjak commented 2 years ago

244 #245 have now hopefully been addressed, and once we have a stagenet node with that included. We should recheck then, which will most likely be after easter. Will you followup then @mh-concordium ?

mh-concordium commented 2 years ago

I wasn't able to reproduce the issue. Node version: 4.0.9 Client version: 4.0.2

abizjak commented 2 years ago

Most likely fixed by #275