NethermindEth / nethermind

A robust execution client for Ethereum node operators.
https://nethermind.io/nethermind-client
GNU General Public License v3.0
1.23k stars 428 forks source link

[Enhancement] [1.10.27] JsonRPC, Metrics & HealthChecks Not Running While in DbLoad #2864

Open Texnomic opened 3 years ago

Texnomic commented 3 years ago

I'm Running the Mainnet Archive Config and the Node is in DbLoad Sync Mode: Syncing previously downloaded blocks from DB (partial offline mode until it finishes).

All of EndPoints are down: JsonRPC, Metrics & HealthChecks.

Note: Node still processing previously downloaded blocks.

Config File:

{
  "Init": {
    "WebSocketsEnabled": true,
    "StoreReceipts": true,
    "IsMining": false,
    "ChainSpecPath": "chainspec/foundation.json",
    "GenesisHash": "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3",
    "BaseDbPath": "nethermind_db/mainnet_archive",
    "LogFileName": "mainnet_archive.logs.txt",
    "MemoryHint": 10240000000
  },
  "Network": {
    "DiscoveryPort": 30303,
    "P2PPort": 30303,
    "ActivePeersMaxCount": 200
  },
  "JsonRpc": {
    "Enabled": true,
    "Timeout": 20000,
    "Host": "127.0.0.1",
    "Port": 8545
  },
  "TxPool": {
    "Size": 2048
  },
  "Db": {
    "CacheIndexAndFilterBlocks": true
  },
  "Sync": {
    "DownloadBodiesInFastSync": false,
    "DownloadReceiptsInFastSync": false,
    "UseGethLimitsInFastBlocks": true
  },
  "EthStats": {
    "Enabled": false,
    "Server": "wss://ethstats.net/api",
    "Name": "Nethermind",
    "Secret": "secret",
    "Contact": "hello@nethermind.io"
  },
  "Metrics": {
    "NodeName": "Nethermind",
    "Enabled": true,
    "PushGatewayUrl": "http://10.0.0.33:9091/metrics",
    "IntervalSeconds": 5
  },
  "HealthChecks": {
    "Enabled": true,
    "WebhooksEnabled": false,
    "WebhooksUri": "https://slack.webhook",
    "UIEnabled": true,
    "PollingInterval": 5,
    "Slug": "/api/health",
    "MaxIntervalWithoutProcessedBlock ": 15,
    "MaxIntervalWithoutProducedBlock": 45
  }
}
tkstanczak commented 3 years ago

Hi @Texnomic - this was an early design of stopping all calls for the time when we sync from DB. Syncing from the DB happens when you stop the node during archive sync while the blocks have already been downloaded from the network. Since these blocks can be then processed entirely offline, we shut down networking while doing it to speed up the sync.

We have seen in the past that it was an undesired behaviour for some users so if you could express your opinion here we would appreciate as we may potentially change this behaviour.

Texnomic commented 3 years ago

@tkstanczak I can understand the design decision, but at least Health Checks & Monitoring should be enabled. Otherwise the node is completely silent :)

dB2510 commented 2 years ago

@Texnomic can you please have a look to this #3680 whether it solves the issue?

AliakseiMalyshau commented 1 year ago

@tkstanczak @LukaszRozmej @dB2510 hello there!

I have the same issue on the Nethermind client version 1.13.3. It's fixed or not?

Why I'm asking: when I ran Nethermind client in Archive mode first time JsonRpc initialized correctly, but when I'm restarting for some reason client it's not initialized, but in logs I see that node is syncing.

Configuration:

{
  "Init": {
    "DiscoveryEnabled": true,
    "WebSocketsEnabled": true,
    "StoreReceipts" : true,
    "ChainSpecPath": "chainspec/fuse.json",
    "BaseDbPath": "nethermind_db/fuse_archive",
    "LogFileName": "fuse_archive.logs.txt",
    "StaticNodesPath": "Data/static-nodes-fuse.json"
  },
  "Network": {
    "DiscoveryPort": 30303,
    "P2PPort": 30303,
    "LocalIp": "0.0.0.0",
    "ExternalIp": "0.0.0.0"
  },
  "JsonRpc": {
        "Enabled": true,
        "Timeout": 20000,
        "Host": "0.0.0.0",
        "Port": 8545,
        "WebSocketsPort": 8546
   },
  "Metrics": {
    "NodeName": "Fuse_archive"
  },
  "Bloom": {
    "IndexLevelBucketSizes": [
      16,
      16,
      16
    ]
  },
  "Pruning": {
    "Mode": "None"
  },
  "Mining": {
    "MinGasPrice": "10000000000"
  }
}

Thank you!

begetan commented 1 year ago

It's a severe issue. I can't believe it was done intentionally! I spent half of the night trying to understand what was wrong with my node and configs.

People use monitoring software to check sync status and it can shut down its RPC endpoint for days or weeks. It's literally unacceptable behavior for most applications and huge cons against using this software.

crypto0243 commented 1 year ago

I spent more than a week to find this link and RPC not listening while synching old blocks . This needs to be addressed to avoid any confusion

MaxTeiger commented 1 year ago

I ran into the same problem, is there a way to know how many blocks the node needs to sync before going back online with RPC etc. enabled ?

Jack-Works commented 1 year ago

I want to do some admin rpc while syncing and it's also impossible

bazzilic commented 8 months ago

It would also be very helpful if the logs said something more informative rather than Syncing previously downloaded blocks from DB (partial offline mode until it finishes). Like where are we with this, will it take half an hour or weeks?

kamilchodola commented 3 months ago

@LukaszRozmej I think now it does not impact that much the performance, right? At least on archive it will not be that visible as archive is not the fastest in current design so maybe we could just enable it back and would be good?

@Demuirgos you were enabling JsonRPc earlier in node startup so maybe you would like to pick this one as well?

LukaszRozmej commented 3 months ago

I remember this being changed back and forth, can you check current state?

kamilchodola commented 3 months ago

Will do it as follows: Start NodeA which will have processing disabled and will download like 5 mln blocks Start NodeB and download just a little bit of blocks (like 500k) Start NodeC and let it process just minimum number of blocks (to make sure it started processing not stuck on BeaconHeaders)

Then stop all and restart and will see how fast those will reach like 2 milion of blocks.

not sure if there will be better testcase for that.