XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.5k stars 1.46k forks source link

noNetwork reset complete_ledgers #2461

Closed simaqiaoxi closed 5 years ago

simaqiaoxi commented 6 years ago

After seeing a noNetwork event should we expect for complete_ledgers to reset? On a few occasions we have had a 'noNetwork' outage spanning anywhere from 10mins-2hours from our ripple node. This morning managed to grab a few server_info's while it was occurring, could you please inform if anything is unusual in this state? Was surprised to see complete-ledgers cleared out. (ledger_history is set to 16)

{
    "result": {
        "info": {
            "build_version": "0.90.0",
            "complete_ledgers": "37540990-37541006,37541114-37541125,37541224-37541265",
            "hostid": "72e6212c021f",
            "io_latency_ms": 1,
            "jq_trans_overflow": "0",
            "last_close": {
                "converge_time_s": 2.999,
                "proposers": 16
            },
            "load": {
                "job_types": [
                    {
                        "job_type": "untrustedProposal",
                        "peak_time": 2,
                        "per_second": 49
                    },
                    {
                        "in_progress": 1,
                        "job_type": "clientCommand",
                        "peak_time": 11,
                        "per_second": 7
                    },
                    {
                        "in_progress": 1,
                        "job_type": "updatePaths"
                    },
                    {
                        "job_type": "transaction",
                        "per_second": 5
                    },
                    {
                        "avg_time": 1,
                        "job_type": "batch",
                        "peak_time": 37,
                        "per_second": 5
                    },
                    {
                        "avg_time": 34,
                        "job_type": "advanceLedger",
                        "peak_time": 873,
                        "per_second": 2
                    },
                    {
                        "job_type": "fetchTxnData",
                        "peak_time": 47,
                        "per_second": 56
                    },
                    {
                        "avg_time": 1,
                        "job_type": "trustedValidation",
                        "peak_time": 20,
                        "per_second": 3
                    },
                    {
                        "avg_time": 1,
                        "job_type": "writeObjects",
                        "peak_time": 50,
                        "per_second": 6
                    },
                    {
                        "job_type": "trustedProposal",
                        "per_second": 7
                    },
                    {
                        "job_type": "peerCommand",
                        "peak_time": 2,
                        "per_second": 719
                    },
                    {
                        "avg_time": 3,
                        "job_type": "diskAccess",
                        "peak_time": 44,
                        "per_second": 2
                    },
                    {
                        "job_type": "processTransaction",
                        "per_second": 5
                    },
                    {
                        "job_type": "SyncReadNode",
                        "peak_time": 39,
                        "per_second": 2104
                    },
                    {
                        "job_type": "WriteNode",
                        "per_second": 280
                    }
                ],
                "threads": 6
            },
            "load_factor": 1,
            "peer_disconnects": "527",
            "peer_disconnects_resources": "0",
            "peers": 10,
            "pubkey_node": "n9Kd572TQTiF6xaDtu4DBugXpZMHxm7gcGVd4q9bHvcvnsLVHeNQ",
            "pubkey_validator": "none",
            "server_state": "full",
            "state_accounting": {
                "connected": {
                    "duration_us": "2823530613",
                    "transitions": 17
                },
                "disconnected": {
                    "duration_us": "2019303",
                    "transitions": 1
                },
                "full": {
                    "duration_us": "100298477",
                    "transitions": 1
                },
                "syncing": {
                    "duration_us": "505814044",
                    "transitions": 17
                },
                "tracking": {
                    "duration_us": "22",
                    "transitions": 1
                }
            },
            "uptime": 3430,
            "validated_ledger": {
                "age": 5,
                "base_fee_xrp": 0.00001,
                "hash": "79B5B91DF5679E9F18D88F75FA87680099CE99C994AC1631CB8160056E028A5E",
                "reserve_base_xrp": 20,
                "reserve_inc_xrp": 5,
                "seq": 37541265
            },
            "validation_quorum": 11,
            "validator_list_expires": "2018-Apr-09 00:00:00"
        },
        "status": "success"
    }
}
{
    "result": {
        "info": {
            "build_version": "0.90.0",
            "complete_ledgers": "37540990-37541006,37541114-37541125,37541224-37541387",
            "fetch_pack": 825,
            "hostid": "72e6212c021f",
            "io_latency_ms": 1,
            "jq_trans_overflow": "0",
            "last_close": {
                "converge_time_s": 2,
                "proposers": 0
            },
            "load": {
                "job_types": [
                    {
                        "job_type": "untrustedProposal",
                        "peak_time": 41,
                        "per_second": 48
                    },
                    {
                        "in_progress": 2,
                        "job_type": "ledgerData",
                        "waiting": 14
                    },
                    {
                        "in_progress": 1,
                        "job_type": "clientCommand",
                        "per_second": 2
                    },
                    {
                        "in_progress": 1,
                        "job_type": "updatePaths"
                    },
                    {
                        "job_type": "transaction",
                        "per_second": 10
                    },
                    {
                        "job_type": "batch",
                        "peak_time": 22,
                        "per_second": 8
                    },
                    {
                        "avg_time": 1,
                        "job_type": "advanceLedger",
                        "peak_time": 172,
                        "per_second": 8
                    },
                    {
                        "avg_time": 8,
                        "job_type": "trustedValidation",
                        "peak_time": 117,
                        "per_second": 3
                    },
                    {
                        "job_type": "writeObjects",
                        "peak_time": 117,
                        "per_second": 18
                    },
                    {
                        "job_type": "trustedProposal",
                        "peak_time": 21,
                        "per_second": 6
                    },
                    {
                        "job_type": "heartbeat",
                        "peak_time": 1
                    },
                    {
                        "job_type": "peerCommand",
                        "per_second": 727
                    },
                    {
                        "avg_time": 6,
                        "job_type": "diskAccess",
                        "peak_time": 112,
                        "per_second": 2
                    },
                    {
                        "job_type": "processTransaction",
                        "per_second": 10
                    },
                    {
                        "job_type": "SyncReadNode",
                        "peak_time": 87,
                        "per_second": 480
                    },
                    {
                        "job_type": "AsyncReadNode",
                        "peak_time": 87,
                        "per_second": 2082
                    },
                    {
                        "job_type": "WriteNode",
                        "per_second": 89
                    }
                ],
                "threads": 6
            },
            "load_factor": 1,
            "peer_disconnects": "600",
            "peer_disconnects_resources": "0",
            "peers": 10,
            "pubkey_node": "n9Kd572TQTiF6xaDtu4DBugXpZMHxm7gcGVd4q9bHvcvnsLVHeNQ",
            "pubkey_validator": "none",
            "server_state": "full",
            "state_accounting": {
                "connected": {
                    "duration_us": "2823530613",
                    "transitions": 17
                },
                "disconnected": {
                    "duration_us": "2019303",
                    "transitions": 1
                },
                "full": {
                    "duration_us": "695525893",
                    "transitions": 1
                },
                "syncing": {
                    "duration_us": "505814044",
                    "transitions": 17
                },
                "tracking": {
                    "duration_us": "22",
                    "transitions": 1
                }
            },
            "uptime": 4025,
            "validated_ledger": {
                "age": 141,
                "base_fee_xrp": 0.00001,
                "hash": "4B4CCD597327BFF4415F753FAC52C3E39DD4EF32826C63C64CFC52C5AF7D3777",
                "reserve_base_xrp": 20,
                "reserve_inc_xrp": 5,
                "seq": 37541387
            },
            "validation_quorum": 11,
            "validator_list_expires": "2018-Apr-09 00:00:00"
        },
        "status": "success"
    }
}

Also noted the low uptime although this is running in a docker container currently reporting 47hrs uptime (after a reset the last time it was stuck reporting noNetwork too long).

Any advice much appreciated. Also looking for notes on how we can encourage rippled to fill the gaps in these ledgers? We don't want a full history, but would like to have say... a full history since ledger 37541387

simaqiaoxi commented 6 years ago

Another server_info where the state is connected instead of full, complete_ledgers now empty and the closed leger is... seq 422, the start of rippled history? Still reporting noNetwork for hours.

{
    "result": {
        "info": {
            "build_version": "0.90.0",
            "closed_ledger": {
                "age": 0,
                "base_fee_xrp": 0.00001,
                "hash": "24E533C0BA598B577F67CAE77E883721338F6120A559F83DB60602A86D00963A",
                "reserve_base_xrp": 200,
                "reserve_inc_xrp": 50,
                "seq": 422
            },
            "complete_ledgers": "empty",
            "fetch_pack": 21082,
            "hostid": "72e6212c021f",
            "io_latency_ms": 1,
            "jq_trans_overflow": "0",
            "last_close": {
                "converge_time_s": 1.999,
                "proposers": 16
            },
            "load": {
                "job_types": [
                    {
                        "job_type": "untrustedProposal",
                        "peak_time": 2,
                        "per_second": 54
                    },
                    {
                        "avg_time": 16,
                        "in_progress": 2,
                        "job_type": "ledgerData",
                        "peak_time": 287,
                        "per_second": 13,
                        "waiting": 7
                    },
                    {
                        "in_progress": 1,
                        "job_type": "clientCommand"
                    },
                    {
                        "job_type": "transaction",
                        "peak_time": 2,
                        "per_second": 18
                    },
                    {
                        "job_type": "batch",
                        "per_second": 5
                    },
                    {
                        "job_type": "advanceLedger",
                        "per_second": 10
                    },
                    {
                        "job_type": "fetchTxnData",
                        "peak_time": 1,
                        "per_second": 45
                    },
                    {
                        "avg_time": 1,
                        "job_type": "trustedValidation",
                        "peak_time": 34,
                        "per_second": 3
                    },
                    {
                        "job_type": "writeObjects",
                        "peak_time": 1,
                        "per_second": 253
                    },
                    {
                        "avg_time": 1,
                        "job_type": "acceptLedger",
                        "peak_time": 3
                    },
                    {
                        "job_type": "trustedProposal",
                        "per_second": 7
                    },
                    {
                        "avg_time": 1,
                        "job_type": "heartbeat",
                        "peak_time": 2
                    },
                    {
                        "job_type": "peerCommand",
                        "per_second": 706
                    },
                    {
                        "job_type": "diskAccess",
                        "peak_time": 1,
                        "per_second": 3
                    },
                    {
                        "job_type": "processTransaction",
                        "per_second": 9
                    },
                    {
                        "job_type": "SyncReadNode",
                        "peak_time": 13,
                        "per_second": 374
                    },
                    {
                        "job_type": "AsyncReadNode",
                        "peak_time": 80,
                        "per_second": 1054
                    },
                    {
                        "job_type": "WriteNode",
                        "per_second": 400
                    }
                ],
                "threads": 6
            },
            "load_factor": 1,
            "peer_disconnects": "337",
            "peer_disconnects_resources": "0",
            "peers": 9,
            "pubkey_node": "n9Kd572TQTiF6xaDtu4DBugXpZMHxm7gcGVd4q9bHvcvnsLVHeNQ",
            "pubkey_validator": "none",
            "published_ledger": "none",
            "server_state": "connected",
            "state_accounting": {
                "connected": {
                    "duration_us": "2293891912",
                    "transitions": 2
                },
                "disconnected": {
                    "duration_us": "1545861",
                    "transitions": 1
                },
                "full": {
                    "duration_us": "69015070",
                    "transitions": 1
                },
                "syncing": {
                    "duration_us": "0",
                    "transitions": 0
                },
                "tracking": {
                    "duration_us": "29",
                    "transitions": 1
                }
            },
            "uptime": 2364,
            "validation_quorum": 11,
            "validator_list_expires": "2018-Apr-09 00:00:00"
        },
        "status": "success"
    }
}
simaqiaoxi commented 6 years ago

Hi team,

This is still occurring in one form or another, today's noNetwork outage started at approx the time of the validator_list_expires, not sure if related (although it's the same date as above). Restarting our node after 10 days uptime but 5hrs outage (currently behind ~5800 ledgers)

{"result":{"info":{"build_version":"0.90.0","complete_ledgers":"37764442-37824703","hostid":"847dfb34d1c9","io_latency_ms":1,"jq_trans_overflow":"0","last_close":{"converge_time_s":2,"proposers":0},"load":{"job_types":[{"job_type":"untrustedValidation","per_second":3},{"job_type":"untrustedProposal","per_second":54},{"in_progress":1,"job_type":"clientCommand"},{"job_type":"peerCommand","per_second":665},{"job_type":"SyncReadNode","per_second":14},{"job_type":"AsyncReadNode","per_second":161},{"job_type":"WriteNode","peak_time":22,"per_second":13}],"threads":6},"load_factor":1,"peer_disconnects":"73","peer_disconnects_resources":"0","peers":10,"pubkey_node":"n9JjN4AXCPubKDLsnYVV91kBpLFKka36U22t6EgkFKRTfgQnBhys","pubkey_validator":"none","server_state":"connected","state_accounting":{"connected":{"duration_us":"21711911524","transitions":2},"disconnected":{"duration_us":"1251378","transitions":1},"full":{"duration_us":"917565801412","transitions":9},"syncing":{"duration_us":"20838061","transitions":9},"tracking":{"duration_us":"646","transitions":9}},"uptime":939292,"validated_ledger":{"age":20735,"base_fee_xrp":1e-05,"hash":"8B23D185179F860736EFD893B495F9F9DCE1E7130E077DCF2A7AF0DBF79B8358","reserve_base_xrp":20,"reserve_inc_xrp":5,"seq":37824703},"validation_quorum":4294967295,"validator_list_expires":"2018-Apr-09 00:00:00"},"status":"success"}}

Noting the following close_time for 37824703 "close_time": 576547200, "close_time_human": "2018-Apr-09 00:00:00"

nbougalis commented 5 years ago

Another server_info where the state is connected instead of full, complete_ledgers now empty and the closed leger is... seq 422, the start of rippled history? Still reporting noNetwork for hours.

Your server had restarted fairly recently when you got this report (see the uptime field).

Also looking for notes on how we can encourage rippled to fill the gaps in these ledgers? We don't want a full history, but would like to have say... a full history since ledger 37541387

rippled will backfill missing ledgers, as it can. It's first priority is to stay in sync with the network. Once it can do that and provide timely service to client then and only then will it attempt to backfill.

I suspect that the issues you're seeing are indicative of a connectivity or other issue. You suggest that the UNL was due to expire around the time when this has happened. We've improved the code used to retrieve the UNL with the latest release of rippled and this may address issues that you were intermittently experiencing that would cause your server's UNL to not update in a timely fashion.

I am going to close this issue. If you continue experiencing issues running rippled, please feel free to open a new one.