[InsufficientNetworkMode - no *.ripple.com peers] (Version: [1.6.0])

ghost commented 3 years ago

Issue Description

I have two machines running rippled. Since Friday I have been getting InsufficientNetworkMode responses for e.g. account_info calls on both of them. This started to happen without my interaction.

Steps to Reproduce

Start rippled, wait for consensus. server_info > server_state is "full"

To troubleshoot I added this to my existing config:

...
[ips_fixed]
xrpl.ws 443
r.ripple.com 51234
s1.ripple.com 51234
s2.ripple.com 51234
zaphod.alloy.ee 51235
...

If I check rippled peers it only returns 127.0.0.1: and ...51235 addresses that are - accoding to dig -x - from zaphod.alloy.ee. I tried rippled connect <ip> 51234 to add s2.ripple.com but although it returns success the new address is not added to the peers list. I have also tried to open the peer port but it had no effect. Is there any problem with *.ripple.com addresses?

Expected Result

Account_info should return the info.

Actual Result

I get a "InsufficientNetworkMode" error message.

ximinez commented 3 years ago

"InsufficientNetworkMode" means that your rippled node is not able to get in sync or stay in sync with the rest of the network. The two most common causes are insufficient connectivity / networking issues, and insufficient computing resources. Sometimes, this happens during transaction spikes when an otherwise sufficient system gets overwhelmed. Most of the time the issue will resolve itself, and the node will backfill any ledgers it missed.

A good starting point is to check the system requirements page to ensure you have sufficient resources available. If you're still unable to stay synced, then post the output of rippled server_info here, and carefully scrub your rippled.cfg file to remove any secrets and post the scrubbed version of that here, too.

ghost commented 3 years ago

hi @ximinez , thank you for your advices. I know what can be the reasons but I don't understand why it happened on both of my machines without me changing anything and why don't I have *.ripple.com peers? And this has been the situation for 3 days now.

I have 23 GB Ram and 6 cores - I guess it should be enough. Here's my server_info:

{
  "result": {
    "info": {
      "build_version": "1.6.0",
      "complete_ledgers": "60828267-60828272",
      "hostid": "******",
      "io_latency_ms": 1,
      "jq_trans_overflow": "0",
      "last_close": {
        "converge_time_s": 2.001,
        "proposers": 0
      },
      "load": {
        "job_types": [
          {
            "job_type": "untrustedValidation",
            "peak_time": 1,
            "per_second": 81
          },
          {
            "job_type": "untrustedProposal",
            "per_second": 1
          },
          {
            "avg_time": 39,
            "in_progress": 1,
            "job_type": "ledgerData",
            "peak_time": 541,
            "per_second": 3
          },
          {
            "in_progress": 1,
            "job_type": "clientCommand",
            "per_second": 1
          },
          {
            "in_progress": 1,
            "job_type": "updatePaths"
          },
          {
            "job_type": "advanceLedger",
            "peak_time": 3,
            "per_second": 9
          },
          {
            "job_type": "fetchTxnData",
            "per_second": 43
          },
          {
            "job_type": "trustedValidation",
            "peak_time": 2,
            "per_second": 26
          },
          {
            "job_type": "writeObjects",
            "peak_time": 1,
            "per_second": 24
          },
          {
            "job_type": "trustedProposal",
            "per_second": 36
          },
          {
            "job_type": "peerCommand",
            "per_second": 365
          },
          {
            "job_type": "SyncReadNode",
            "peak_time": 8,
            "per_second": 1715
          },
          {
            "job_type": "AsyncReadNode",
            "peak_time": 5,
            "per_second": 3398
          },
          {
            "job_type": "WriteNode",
            "per_second": 56
          }
        ],
        "threads": 6
      },
      "load_factor": 1,
      "peer_disconnects": "558",
      "peer_disconnects_resources": "0",
      "peers": 10,
      "pubkey_node": "******",
      "pubkey_validator": "none",
      "published_ledger": 60828272,
      "server_state": "connected",
      "server_state_duration_us": "1548579243",
      "state_accounting": {
        "connected": {
          "duration_us": "4871418631",
          "transitions": 6
        },
        "disconnected": {
          "duration_us": "79361085",
          "transitions": 4
        },
        "full": {
          "duration_us": "0",
          "transitions": 0
        },
        "syncing": {
          "duration_us": "10043647",
          "transitions": 2
        },
        "tracking": {
          "duration_us": "0",
          "transitions": 0
        }
      },
      "time": "2021-Jan-11 14:11:46.813513 UTC",
      "uptime": 4960,
      "validated_ledger": {
        "age": 1608,
        "base_fee_xrp": 1e-05,
        "hash": "49E517A8BB2D37F4A74B1E929209210E381B477D66EEE74E8E9A6C2913455AAC",
        "reserve_base_xrp": 20,
        "reserve_inc_xrp": 5,
        "seq": 60828289
      },
      "validation_quorum": 31,
      "validator_list": {
        "count": 1,
        "expiration": "2021-Feb-21 00:00:00.000000000 UTC",
        "status": "active"
      }
    },
    "status": "success"
  }
}

ximinez commented 3 years ago

I see you have 10 peers, but I also notice you have 558 peer disconnects. Some disconnects are normal, so that number will increase over time, but with an uptime of only 4960 seconds, that's one disconnect about every 9 seconds. It's also taking an average of 39 seconds to process ledger data. Those both seem pretty high.

My best guess right now is that you don't have sufficient IOPS to keep up when things get busy. What kind of hard drive are you using? We recommend an SSD drive with a minimum of 1000 IOPS. It's also possible that your network connection is having trouble. Can you run some kind of network speed and latency test?

alloynetworks commented 3 years ago

Replying really late, apologies! The [ips_fixed] entries are the problem. xrpl.ws is a websocket cluster, not a peer cluster. And the other servers' peer port should be 51235 not 51234

[ips_fixed]
r.ripple.com 51235
s1.ripple.com 51235
s2.ripple.com 51235
zaphod.alloy.ee 51235

may yield you better results. If you don't have [peer_private] set to 1, then you can actually comment out this entire section.

XRPLF / rippled