XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.55k stars 1.48k forks source link

ledger_history does not sync specified amount of ledgers #3250

Closed l4zygreed closed 4 years ago

l4zygreed commented 4 years ago

Started node which connected to mainnet and have next lines in config file

[ledger_history]
2250000

[node_db]
...
online_delete=2250000

[ips_fixed]
54.245.168.196 51235
169.54.2.152 51235
34.248.24.149 51235

full node ips copied from https://livenet.xrpl.org/network/

but my node syncing only ~1Gb of ledgers and then it changes server_state to full in responce from server_info command.

how can I sync specified amount of ledger history?

MarkusTeufelberger commented 4 years ago

Just wait while it backfills, the server_state full just means that it is fully synced to the current state of the ledger, not that the disk is full or that it stopped fetching history from peers.

l4zygreed commented 4 years ago

server is up for near 24 hours it have ~9Gb of data and complete_ledgers ids way more higher than even on v1.ripple.com node (if I understand right, complete_ledgers show ledgers id range, available on node)

for me it doesn't look like it syncing old ledgers at all

MarkusTeufelberger commented 4 years ago

Please post the output of server_info from now and one hour later.

l4zygreed commented 4 years ago

Yesterday

{
   "result" : {
      "info" : {
         "build_version" : "1.4.0",
         "complete_ledgers" : "53221120-53243639,53243721-53244385,53244509-53244859,53244954-53245534,53245625-53245816,53245915-53246067,53246230-53246300,53246391-53246659,53246781-53247048,53247153-53247218,53247378-53247530,53247681-53247694",
         "hostid" : "787a6ec48b63",
         "io_latency_ms" : 1,
         "jq_trans_overflow" : "102",
         "last_close" : {
            "converge_time_s" : 2,
            "proposers" : 34
         },
         "load" : {
            "job_types" : [
               {
                  "in_progress" : 2,
                  "job_type" : "publishAcqLedger",
                  "waiting" : 3
               },
               {
                  "job_type" : "ledgerRequest",
                  "per_second" : 14
               },
               {
                  "job_type" : "untrustedProposal",
                  "per_second" : 42
               },
               {
                  "job_type" : "ledgerData",
                  "peak_time" : 8
               },
               {
                  "in_progress" : 1,
                  "job_type" : "clientCommand",
                  "per_second" : 10
               },
               {
                  "job_type" : "transaction",
                  "per_second" : 9
               },
               {
                  "job_type" : "batch",
                  "peak_time" : 1,
                  "per_second" : 7
               },
               {
                  "in_progress" : 1,
                  "job_type" : "advanceLedger",
                  "peak_time" : 3
               },
               {
                  "job_type" : "fetchTxnData",
                  "per_second" : 5
               },
               {
                  "avg_time" : 6416,
                  "job_type" : "writeAhead",
                  "over_target" : true,
                  "peak_time" : 25652
               },
               {
                  "job_type" : "trustedValidation",
                  "peak_time" : 2,
                  "per_second" : 7
               },
               {
                  "job_type" : "writeObjects",
                  "peak_time" : 2,
                  "per_second" : 27
               },
               {
                  "job_type" : "trustedProposal",
                  "per_second" : 14
               },
               {
                  "job_type" : "peerCommand",
                  "per_second" : 983
               },
               {
                  "job_type" : "processTransaction",
                  "per_second" : 9
               },
               {
                  "job_type" : "SyncReadNode",
                  "per_second" : 42
               },
               {
                  "job_type" : "AsyncReadNode",
                  "per_second" : 16
               },
               {
                  "job_type" : "WriteNode",
                  "per_second" : 188
               }
            ],
            "threads" : 6
         },
         "load_factor" : 1,
         "peer_disconnects" : "18",
         "peer_disconnects_resources" : "0",
         "peers" : 13,
         "pubkey_node" : "n9J5X8VF2SZL34sWkC3MiqyEptMJRqgS5LUTDJhUo7m8doFK42am",
         "pubkey_validator" : "none",
         "published_ledger" : 53247694,
         "server_state" : "full",
         "server_state_duration_us" : "529045218",
         "state_accounting" : {
            "connected" : {
               "duration_us" : "140571457",
               "transitions" : 1
            },
            "disconnected" : {
               "duration_us" : "2123066",
               "transitions" : 1
            },
            "full" : {
               "duration_us" : "91816386614",
               "transitions" : 100
            },
            "syncing" : {
               "duration_us" : "241532022",
               "transitions" : 100
            },
            "tracking" : {
               "duration_us" : "4",
               "transitions" : 100
            }
         },
         "time" : "2020-Feb-06 14:24:42.923473",
         "uptime" : 92201,
         "validated_ledger" : {
            "age" : 2,
            "base_fee_xrp" : 1e-05,
            "hash" : "A0CD758B3B646E9AA46C836BAE7E91D28B153FAF636A4DE39F3C11DB4F223B53",
            "reserve_base_xrp" : 20,
            "reserve_inc_xrp" : 5,
            "seq" : 53247708
         },
         "validation_quorum" : 28,
         "validator_list" : {
            "count" : 1,
            "expiration" : "2020-Feb-17 00:00:00.000000000",
            "status" : "active"
         }
      },
      "status" : "success"
   }
}

and today

{
   "result" : {
      "info" : {
         "build_version" : "1.4.0",
         "complete_ledgers" : "53221120-53243639,53243721-53244385,53244509-53244859,53244954-53245534,53245625-53245816,53245915-53246067,53246230-53246300,53246391-53246659,53246781-53247048,53247153-53247218,53247378-53247530,53247681-53247833,53247957-53248069,53248203-53248393,53248507-53248707,53248826-53249035,53249160-53249392,53249545-53249638,53249750-53249946,53250113-53250234,53250331-53250461,53250573-53250639,53250732-53250841,53250942-53251178,53251307-53251542,53251665-53251884,53252013-53252301,53252437-53252730,53252871-53253066,53253163-53253354,53253540-53253716,53253929-53254031,53254191-53254265,53254377-53254522,53254689-53254755,53254869-53254973,53255074-53255253,53255434-53255564,53255751-53255819,53255910-53255978,53256079-53256236,53256433-53256566,53256713-53256793,53256934-53257097,53257268-53257427,53257640-53257726,53257881-53257955,53258056-53258180,53258339-53258461,53258638-53258713,53258855-53258927,53259043-53259139,53259275-53259350,53259452-53259524,53259613-53259743,53259912-53260055,53260256-53260404,53260611-53260680,53260774-53260926,53261129-53261243,53261488-53261552,53261641-53261707,53261840-53261920,53262046-53262120,53262227-53262326,53262457-53262548,53262640-53262776,53262922-53263018,53263124-53263252,53263396-53263515,53263630-53263786,53263953-53264121,53264341-53264432,53264563-53264575",
         "hostid" : "787a6ec48b63",
         "io_latency_ms" : 1,
         "jq_trans_overflow" : "414",
         "last_close" : {
            "converge_time_s" : 3,
            "proposers" : 34
         },
         "load" : {
            "job_types" : [
               {
                  "in_progress" : 2,
                  "job_type" : "publishAcqLedger",
                  "waiting" : 6
               },
               {
                  "job_type" : "ledgerRequest",
                  "per_second" : 1
               },
               {
                  "job_type" : "untrustedProposal",
                  "per_second" : 13
               },
               {
                  "job_type" : "ledgerData",
                  "peak_time" : 5,
                  "per_second" : 2
               },
               {
                  "in_progress" : 1,
                  "job_type" : "clientCommand",
                  "per_second" : 8
               },
               {
                  "job_type" : "transaction",
                  "per_second" : 12
               },
               {
                  "job_type" : "batch",
                  "per_second" : 11
               },
               {
                  "avg_time" : 7,
                  "in_progress" : 1,
                  "job_type" : "advanceLedger",
                  "peak_time" : 463,
                  "per_second" : 1
               },
               {
                  "job_type" : "fetchTxnData",
                  "per_second" : 7
               },
               {
                  "in_progress" : 1,
                  "job_type" : "writeAhead"
               },
               {
                  "job_type" : "trustedValidation",
                  "per_second" : 6
               },
               {
                  "job_type" : "writeObjects",
                  "peak_time" : 3,
                  "per_second" : 70
               },
               {
                  "job_type" : "trustedProposal",
                  "per_second" : 12
               },
               {
                  "job_type" : "peerCommand",
                  "per_second" : 952
               },
               {
                  "job_type" : "processTransaction",
                  "per_second" : 12
               },
               {
                  "job_type" : "SyncReadNode",
                  "per_second" : 76
               },
               {
                  "job_type" : "AsyncReadNode",
                  "per_second" : 34
               },
               {
                  "job_type" : "WriteNode",
                  "per_second" : 249
               }
            ],
            "threads" : 6
         },
         "load_factor" : 1.953125,
         "load_factor_local" : 1.953125,
         "peer_disconnects" : "19",
         "peer_disconnects_resources" : "0",
         "peers" : 13,
         "pubkey_node" : "n9J5X8VF2SZL34sWkC3MiqyEptMJRqgS5LUTDJhUo7m8doFK42am",
         "pubkey_validator" : "none",
         "published_ledger" : 53264575,
         "server_state" : "full",
         "server_state_duration_us" : "840683633",
         "state_accounting" : {
            "connected" : {
               "duration_us" : "140571457",
               "transitions" : 1
            },
            "disconnected" : {
               "duration_us" : "2123066",
               "transitions" : 1
            },
            "full" : {
               "duration_us" : "157420402654",
               "transitions" : 166
            },
            "syncing" : {
               "duration_us" : "439469839",
               "transitions" : 180
            },
            "tracking" : {
               "duration_us" : "18964975",
               "transitions" : 180
            }
         },
         "time" : "2020-Feb-07 08:41:43.842388",
         "uptime" : 158022,
         "validated_ledger" : {
            "age" : 3,
            "base_fee_xrp" : 1e-05,
            "hash" : "1B00ADD310C5F7C47E74886F2A21460229E081E2E84407630BB91FB9C6BC4112",
            "reserve_base_xrp" : 20,
            "reserve_inc_xrp" : 5,
            "seq" : 53264583
         },
         "validation_quorum" : 28,
         "validator_list" : {
            "count" : 1,
            "expiration" : "2020-Feb-17 00:00:00.000000000",
            "status" : "active"
         }
      },
      "status" : "success"
   }
}
MarkusTeufelberger commented 4 years ago

These holes seem like some parts of your system have a hard time keeping up. Are you running on (fast) SSDs?

l4zygreed commented 4 years ago

I'm running it on VDS, but it must be pretty fast

MarkusTeufelberger commented 4 years ago

I seriously doubt that - what does iostat -x 1 look like while the server is running? Sounds like #3051 and similar issues.

Maybe there should be a disk benchmark out there that can measure the actual usage pattern of rippled, so people can get a better look at expected performance.

l4zygreed commented 4 years ago

output of iostat -x 1 while rippled running

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    7.00    0.00    40.00     0.00    11.43     5.47  573.57  573.57    0.00 143.00 100.10
dm-0              0.00     0.00    8.00    0.00    44.00     0.00    11.00     5.47  501.88  501.88    0.00 125.12 100.10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.50    0.00    1.00   93.23    0.00    4.26

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00   23.00    0.00   148.00     0.00    12.87     4.81  280.52  280.52    0.00  43.48 100.00
dm-0              0.00     0.00   22.00    0.00   144.00     0.00    13.09     4.81  293.27  293.27    0.00  45.45 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.75   98.49    0.00    0.25

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00   27.00    0.00   148.00     0.00    10.96     4.67  158.59  158.59    0.00  37.04 100.00
dm-0              0.00     0.00   27.00    0.00   144.00     0.00    10.67     4.67  158.59  158.59    0.00  37.04 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.75    0.00    0.50   98.75    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00   15.00    0.00   104.00     0.00    13.87     5.00  311.13  311.13    0.00  66.67 100.00
dm-0              0.00     0.00   15.00    0.00   108.00     0.00    14.40     5.00  311.13  311.13    0.00  66.67 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.01    0.00    0.50   97.24    0.00    1.26

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00   24.00    0.00   164.00     0.00    13.67     4.66  227.67  227.67    0.00  41.67 100.00
dm-0              0.00     0.00   24.00    0.00   160.00     0.00    13.33     4.66  227.67  227.67    0.00  41.67 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.26    0.00    0.75   96.98    0.00    1.01

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00   12.00    0.00    76.00     0.00    12.67     4.69  362.08  362.08    0.00  83.33 100.00
dm-0              0.00     0.00   12.00    0.00    80.00     0.00    13.33     4.69  362.08  362.08    0.00  83.33 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.75    0.00    0.50   98.75    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00   15.00    0.00    92.00     0.00    12.27     5.00  338.87  338.87    0.00  66.67 100.00
dm-0              0.00     0.00   15.00    0.00    84.00     0.00    11.20     5.00  338.87  338.87    0.00  66.67 100.00
MarkusTeufelberger commented 4 years ago

Well, there's your problem - just look at the iowait percentages. On my node they sometimes spike to 5%, you're at a constant 95%+.

You need to change to a better storage solution with lots of random read/write IOPS. This looks like an hardware issue, not something in the software.

mDuo13 commented 4 years ago

Seconding what @MarkusTeufelberger said. Your config file stanzas look OK, but gaps in complete_ledgers almost always indicates either insufficient disk IO, RAM, or not a good enough network connection.

l4zygreed commented 4 years ago

yeah, changed disk to ssd, and now it work fine, thx