XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.49k stars 1.45k forks source link

High CPU usage [v 1.6.0] #3767

Open tuloski opened 3 years ago

tuloski commented 3 years ago

I have a node (non validating) running v 1.6.0 (but similar performances with 1.5.0) with the following specs:

The CPU is always between 110% and 350% usage only for rippled.

It seems an abnormal cpu usage.

The following is the rippled.cfg:

port_rpc_admin_local
port_peer
port_ws_admin_local

[port_rpc_admin_local]
port = 5005
ip = 127.0.0.1
admin = 127.0.0.1
protocol = http

[port_peer]
port = 51235
ip = 0.0.0.0
protocol = peer

[port_ws_admin_local]
port = 6006
ip = 127.0.0.1
admin = 127.0.0.1
protocol = ws

[node_size]
small

[node_db]
type=NuDB
path=/var/lib/rippled/db/nudb
open_files=2000
filter_bits=12
cache_mb=256
file_size_mb=8
file_size_mult=2
online_delete=512
advisory_delete=0

[database_path]
/var/lib/rippled/db

[debug_logfile]
/var/log/rippled/debug.log

[sntp_servers]
time.windows.com
time.apple.com
time.nist.gov
pool.ntp.org

[validators_file]
validators.txt

[rpc_startup]
{ "command": "log_level", "severity": "warning" }

[ssl_verify]
1

The following is the server_info answer:

   "result" : {
      "info" : {
         "build_version" : "1.5.0",
         "complete_ledgers" : "55162159-55162719",
         "hostid" : "xxx",
         "io_latency_ms" : 1,
         "jq_trans_overflow" : "0",
         "last_close" : {
            "converge_time_s" : 3.675,
            "proposers" : 36
         },
         "load" : {
            "job_types" : [
               {
                  "avg_time" : 171,
                  "job_type" : "ledgerRequest",
                  "peak_time" : 1100,
                  "per_second" : 2
               },
               {
                  "avg_time" : 66,
                  "job_type" : "untrustedProposal",
                  "peak_time" : 749,
                  "per_second" : 30
               },
               {
                  "avg_time" : 37,
                  "job_type" : "ledgerData",
                  "peak_time" : 639,
                  "per_second" : 3
               },
               {
                  "avg_time" : 3,
                  "in_progress" : 2,
                  "job_type" : "clientCommand",
                  "peak_time" : 91,
                  "per_second" : 4
               },
               {
                  "avg_time" : 54,
                  "job_type" : "transaction",
                  "peak_time" : 669,
                  "per_second" : 3
               },
               {
                  "avg_time" : 13,
                  "job_type" : "batch",
                  "peak_time" : 342,
                  "per_second" : 1
               },
               {
                  "avg_time" : 34,
                  "job_type" : "advanceLedger",
                  "peak_time" : 459,
                  "per_second" : 8
               },
               {
                  "avg_time" : 18,
                  "job_type" : "fetchTxnData",
                  "peak_time" : 790,
                  "per_second" : 3
               },
               {
                  "avg_time" : 122,
                  "job_type" : "trustedValidation",
                  "peak_time" : 915,
                  "per_second" : 8
               },
               {
                  "in_progress" : 1,
                  "job_type" : "acceptLedger"
               },
               {
                  "avg_time" : 34,
                  "job_type" : "trustedProposal",
                  "peak_time" : 423,
                  "per_second" : 13
               },
               {
                  "in_progress" : 1,
                  "job_type" : "sweep"
               },
               {
                  "avg_time" : 173,
                  "job_type" : "heartbeat",
                  "peak_time" : 663
               },
               {
                  "job_type" : "peerCommand",
                  "peak_time" : 13,
                  "per_second" : 535
               },
               {
                  "job_type" : "processTransaction",
                  "per_second" : 3
               },
               {
                  "job_type" : "SyncReadNode",
                  "peak_time" : 346,
                  "per_second" : 5445
               },
               {
                  "job_type" : "AsyncReadNode",
                  "peak_time" : 5,
                  "per_second" : 1322
               },
               {
                  "job_type" : "WriteNode",
                  "peak_time" : 16,
                  "per_second" : 1993
               }
            ],
            "threads" : 6
         },
         "load_factor" : 1,
         "peer_disconnects" : "439",
         "peer_disconnects_resources" : "0",
         "peers" : 10,
         "pubkey_node" : "xxx",
         "pubkey_validator" : "none",
         "server_state" : "full",
         "server_state_duration_us" : "1314268034",
         "state_accounting" : {
            "connected" : {
               "duration_us" : "13697771492",
               "transitions" : 15030
            },
            "disconnected" : {
               "duration_us" : "3928594039",
               "transitions" : 28
            },
            "full" : {
               "duration_us" : "1092198327089",
               "transitions" : 21439
            },
            "syncing" : {
               "duration_us" : "18600613786",
               "transitions" : 6485
            },
            "tracking" : {
               "duration_us" : "4365361939",
               "transitions" : 21444
            }
         },
         "time" : "2020-May-01 10:37:37.054364 UTC",
         "uptime" : 1132790,
         "validated_ledger" : {
            "age" : 6,
            "base_fee_xrp" : 1e-05,
            "hash" : "41C71CA868653CCB19475EED1253267632F31734A0772E5E79373F625F64E5CB",
            "reserve_base_xrp" : 20,
            "reserve_inc_xrp" : 5,
            "seq" : 55162719
         },
         "validation_quorum" : 29,
         "validator_list" : {
            "count" : 1,
            "expiration" : "2020-Jun-02 00:00:00.000000000 UTC",
            "status" : "active"
         }
      },
      "status" : "success"
   }
}
nbougalis commented 3 years ago

Two comments:

First, you are still running version 1.5.0. The current release is 1.6.0 and the 1.7.0 build should be going out in the next few days.

Second, and perhaps more important, is that that you are doing online delete way too frequently. I don't think that's likely to be the only reason behind high CPU usage, but it probably doesn't help. I'd recommend increasing the online_delete interval to around 25000 and restarting your server.

tuloski commented 3 years ago

For my laziness I copied the data I posted on XRPchat. Now I'm running 1.6.0 and with higher online_delete and I'm in the same situation. As soon as 1.7.0 will be out I'll update this issue.

ximinez commented 3 years ago

Another issue is that your node_size is small. Since you have 16Gb RAM, medium will probably work better for you. As a rule of thumb, the node_size setting trades RAM for CPU, so if you save RAM by using a smaller size, you'll use more CPU, and vice versa. You said you're running other stuff, but if they're small, they probably won't cause any problems. See https://xrpl.org/capacity-planning.html#node-size for more information.

Additionally, 1.7.0 drastically reduces memory consumption. My local huge node is only using 8Gb of memory for the entire system with rc2. If you're still having trouble after upgrading, and want to experiment, you could try huge and see how that does.

tuloski commented 3 years ago

I was using small because I had also RAM issues before 1.5.0 (like 80% used). With 1.5 and 1.6 it improved, and with 1.7 will be better so it's a nice tip.

tuloski commented 3 years ago

Upgraded to 1.7.0 with node_size medium. Using 3.6 Gb of RAM, but the CPU is VERY high.

It was way better (in terms of CPU usage) with 1.6.0 switching to node_size medium, but it was using 9+ Gb of RAM.

ximinez commented 3 years ago

When you say "very high", do you mean that the CPU graphs are consistently running above some value, or do you mean that rippled is pegging your hardware such that it can't keep up with the network? Or do you mean something else?

As I mentioned above, the node_size trades CPU usage for memory. Even though huge is intended for systems with 32Gb of RAM or more, actual usage in 1.7 is significantly less than that, so it might be fine to use on your 16Gb system. If rippled is having problems, you can try using huge.

tuloski commented 3 years ago

With very high I mean the average CPU usage is above 60% on all 4 cores and with higher peaks. It can keep it up with the network, but the electricity usage is higher, it becomes hotter and the fan is constantly on. A problem I didn't have with some previous versions, but maybe it's a delicate choice of the parameters.

With 1.7.0 and node_size huge the RAM is around 4Gb and the average CPU usage seems 30% on all cores which is nice. From medium to huge only a tiny amount of extra memory was used with a big improvement on CPU. I'll keep you updated because usually the stats change over time.