XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.51k stars 1.46k forks source link

rippled account_info lgrNotFound error. #2961

Closed dmitriano closed 3 years ago

dmitriano commented 5 years ago

I made an attempt to run rippled both versions 1.1.2 and 1.2.4 on Ubuntu 18.04 and I would not say it does not work at all, because it utilizes CPU and Disk, creates database of 3GB size, and I even was able to create a wallet from the command line, but if I do

./rippled account_info r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59 true

I always get something similar to the result described in this post:

Loading: "/home/xrp/.config/ripple/rippled.cfg"
2019-May-29 10:04:10.273909186 HTTPClient:NFO Connecting to 127.0.0.1:5005

{
   "result" : {
      "error" : "lgrNotFound",
      "error_code" : 21,
      "error_message" : "ledgerNotFound",
      "request" : {
         "account" : "r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59",
         "command" : "account_info",
         "ledger_index" : 0
      },
      "status" : "error"
   }
}

I am not sure the error is caused by the low hardware performance. I run rippled on virtual machine with SSD and 16GB RAM, disk IO rate periodically riches 40+M/s, but my CPU is Intel Celeron 2.8GHz with 2 cores, periodically CPU usage reaches 100%, can CPU be the bottleneck? See the screenshot.

What else can be wrong?

See Building and running rippled on Ubuntu for more information on what steps I did.

MarkusTeufelberger commented 5 years ago

Looks to me like you have a slow SSD and the weak CPU probably doesn't help either... do you do continuous monitoring of that machine (e.g. wih Prometheus) or are you running blindly?

You also should check if you are actually able to receive incoming connections on your peer port, since you only have 10 peers.

Why do you want to store history on a validator or why do you want to validate if you want to use a server as RPC endpoint?

Lastly according to your screenshots, you are also running another very IO heavy program (geth) on the same machine. I suspect that this also will not help, performance wise.

dmitriano commented 5 years ago

@MarkusTeufelberger See my SSD benchmark results, is it OK? Currently there is no monitoring software on this machine.

I opened port 51235 for incoming connections and now I have about 20 peers.

I thought that validator may help, but if I do not need it I switch if off. Does it affect the performance?

'geth' is in 'light' mode, so its unlikely it consume resources, however it can be turned off to make a clean experiment.

But, I still have the same 'lgrNotFound' error. What to do next? Install Prometheus and/or probably upgrade CPU?

nbougalis commented 5 years ago

I love the writeup you link to. Thanks for taking the time to compile, and try out rippled!

On first glance, your SSD seems like it should be plenty fast. Are you running this in a VM or cloud environment? Some of them impose restrictions.

The output from your server_state calls suggests you synced up to the network and, at some point, were tracking it pretty well but then fell behind; see the age of the validated_ledger object: it should never really be more than about 7 seconds on a healthy server.

It's hard to know why you fell so far behind. This can be for a number of reasons, connectivity being one of them, although your server seems to have had 10 peers at the time. It's possible that it's disk I/O.

I don't think upgrading the CPU is needed at this time. I don't think will help and I doubt you're truly CPU-bound. My educated guess is that you're running on a virtual environment.

Can you explain your setup a little more?

dmitriano commented 5 years ago

@nbougalis I am running rippled on Ubuntu 18.04 Desktop inside Hyper-V virtual machine on Windows 10 Pro (version 1903). See my Ubuntu system summary (for some reason inxi unable to detect the network card). Hyper-V has a typical configuration with 2 Processors, 16GB RAM (with disabled Dynamic Memory), about 200GB hard drive and External Network adapter (with IP address visible in the local network).

I left rippled running for several days and it stopped working at all, at least the commands from the command line hangs up for a long time and then show this:

 {
   "error" : "internal",
   "error_code" : 71,
   "error_message" : "Internal error.",
   "error_what" : "no response from server"
}

and there were multiple occurrences of a string like this in logs:

rippled[687]: 2019-Jun-15 22:54:30.584170705 LoadManager:WRN Server stalled for 31820 seconds.

htop displayed me this.

I tried to restart the service with sudo service rippled restart and got Failed to restart rippled.service: Unit rippled.service not found. (I am not a linux expert and I do not know exactly what does this mean).

At the same time geth (Ethereum node) running on the same virtual machine along with rippled worked fine.

I rebooted the machine and rippled started to work normally as before, see htop screenshot.

Right after reboot I run ./rippled server_info multiple times and age varied between 0 and 11. If I run ./rippled server_state there is no age at all. The number of peers was 10 - 13, within some time it reached 21 (peer port is open for incoming connections). But still "lgrNotFound" error. See the entire output of server_info below:

{
   "result" : {
      "info" : {
         "build_version" : "1.2.4",
         "complete_ledgers" : "47998296-48001809",
         "hostid" : "bank",
         "io_latency_ms" : 1,
         "jq_trans_overflow" : "0",
         "last_close" : {
            "converge_time_s" : 3.002,
            "proposers" : 31
         },
         "load" : {
            "job_types" : [
               {
                  "avg_time" : 26,
                  "job_type" : "untrustedValidation",
                  "peak_time" : 414,
                  "per_second" : 1
               },
               {
                  "avg_time" : 49,
                  "job_type" : "ledgerRequest",
                  "peak_time" : 190
               },
               {
                  "avg_time" : 12,
                  "job_type" : "untrustedProposal",
                  "peak_time" : 249,
                  "per_second" : 43
               },
               {
                  "avg_time" : 20,
                  "in_progress" : 1,
                  "job_type" : "ledgerData",
                  "peak_time" : 220,
                  "per_second" : 1
               },
               {
                  "in_progress" : 1,
                  "job_type" : "clientCommand",
                  "peak_time" : 18,
                  "per_second" : 8
               },
               {
                  "avg_time" : 6,
                  "job_type" : "transaction",
                  "peak_time" : 150,
                  "per_second" : 7
               },
               {
                  "job_type" : "batch",
                  "peak_time" : 19,
                  "per_second" : 7
               },
               {
                  "avg_time" : 5,
                  "job_type" : "advanceLedger",
                  "peak_time" : 562,
                  "per_second" : 7
               },
               {
                  "job_type" : "fetchTxnData",
                  "peak_time" : 3,
                  "per_second" : 5
               },
               {
                  "avg_time" : 7,
                  "job_type" : "trustedValidation",
                  "peak_time" : 381,
                  "per_second" : 7
               },
               {
                  "job_type" : "writeObjects",
                  "peak_time" : 222,
                  "per_second" : 72
               },
               {
                  "avg_time" : 244,
                  "job_type" : "acceptLedger",
                  "peak_time" : 769
               },
               {
                  "avg_time" : 4,
                  "job_type" : "trustedProposal",
                  "peak_time" : 123,
                  "per_second" : 14
               },
               {
                  "avg_time" : 1,
                  "job_type" : "heartbeat",
                  "peak_time" : 3
               },
               {
                  "job_type" : "peerCommand",
                  "peak_time" : 15,
                  "per_second" : 1447
               },
               {
                  "job_type" : "diskAccess",
                  "peak_time" : 114,
                  "per_second" : 7
               },
               {
                  "job_type" : "processTransaction",
                  "per_second" : 7
               },
               {
                  "job_type" : "SyncReadNode",
                  "peak_time" : 11,
                  "per_second" : 93
               },
               {
                  "job_type" : "AsyncReadNode",
                  "peak_time" : 57,
                  "per_second" : 2818
               },
               {
                  "job_type" : "WriteNode",
                  "peak_time" : 16,
                  "per_second" : 444
               }
            ],
            "threads" : 4
         },
         "load_factor" : 1,
         "peer_disconnects" : "452",
         "peer_disconnects_resources" : "0",
         "peers" : 21,
         "pubkey_node" : "n9MVgjp3rEh6UAWHMA4MwQRerSzmyJb5toQqUaMMUcdXrf6ruXsU",
         "pubkey_validator" : "nHUQnU65NwZh8eaHEvJit6R4V5gMySCAaWNW1a84zZtbwzFUzV7E",
         "server_state" : "proposing",
         "server_state_duration_us" : "20533213659",
         "state_accounting" : {
            "connected" : {
               "duration_us" : "493346490",
               "transitions" : 1
            },
            "disconnected" : {
               "duration_us" : "3861135",
               "transitions" : 1
            },
            "full" : {
               "duration_us" : "20585527352",
               "transitions" : 2
            },
            "syncing" : {
               "duration_us" : "9803805",
               "transitions" : 2
            },
            "tracking" : {
               "duration_us" : "1",
               "transitions" : 2
            }
         },
         "time" : "2019-Jun-16 05:43:46.894292",
         "uptime" : 21094,
         "validated_ledger" : {
            "age" : 5,
            "base_fee_xrp" : 1e-05,
            "hash" : "2CFA7AF7FB8D90BF0A54CCECC7D7475BF3F92B3C5B79AB93EA140A2B29477546",
            "reserve_base_xrp" : 20,
            "reserve_inc_xrp" : 5,
            "seq" : 48001809
         },
         "validation_quorum" : 26,
         "validator_list" : {
            "count" : 1,
            "expiration" : "2019-Jul-17 00:00:00.000000000",
            "status" : "active"
         }
      },
      "status" : "success"
   }
}

logs:

июн 16 08:50:12 bank rippled[671]: 2019-Jun-16 05:50:12.148396391 LoadMonitor:WRN Job: recvPropose->checkPropose run: 0ms wait: 1046ms
июн 16 08:50:12 bank rippled[671]: 2019-Jun-16 05:50:12.148658589 LoadMonitor:WRN Job: recvPropose->checkPropose run: 0ms wait: 1014ms
июн 16 08:50:12 bank rippled[671]: 2019-Jun-16 05:50:12.138495077 LoadMonitor:WRN Job: processLedgerData run: 547ms wait: 1764ms
июн 16 08:50:12 bank rippled[671]: 2019-Jun-16 05:50:12.153961643 LoadMonitor:WRN Job: recvGetLedger run: 0ms wait: 1651ms
июн 16 08:50:12 bank rippled[671]: 2019-Jun-16 05:50:12.154039442 LoadMonitor:WRN Job: recvGetLedger run: 0ms wait: 1248ms
июн 16 08:50:12 bank rippled[671]: 2019-Jun-16 05:50:12.154127541 LoadMonitor:WRN Job: recvPropose->checkPropose run: 0ms wait: 1051ms
июн 16 08:50:15 bank rippled[671]: 2019-Jun-16 05:50:15.385763539 LoadMonitor:WRN Job: processLedgerData run: 1373ms wait: 23ms
июн 16 08:50:15 bank rippled[671]: 2019-Jun-16 05:50:15.385922838 LoadMonitor:WRN Job: processLedgerData run: 1563ms wait: 2ms
июн 16 08:50:16 bank rippled[671]: 2019-Jun-16 05:50:16.665767409 LoadMonitor:WRN Job: sweep run: 1212ms wait: 0ms
июн 16 08:50:19 bank rippled[671]: 2019-Jun-16 05:50:19.931940306 LoadMonitor:WRN Job: processLedgerData run: 1080ms wait: 0ms
июн 16 08:50:24 bank rippled[671]: 2019-Jun-16 05:50:24.080542031 LoadMonitor:WRN Job: processLedgerData run: 1016ms wait: 0ms
июн 16 08:50:26 bank rippled[671]: 2019-Jun-16 05:50:26.881391575 LoadMonitor:WRN Job: acceptLedger run: 1046ms wait: 1ms
июн 16 08:50:28 bank rippled[671]: 2019-Jun-16 05:50:28.553912240 LoadMonitor:WRN Job: sweep run: 1706ms wait: 177ms
июн 16 08:50:28 bank rippled[671]: 2019-Jun-16 05:50:28.666231439 LoadMonitor:WRN Job: processLedgerData run: 2255ms wait: 138ms
июн 16 08:50:30 bank rippled[671]: 2019-Jun-16 05:50:30.973262804 LoadMonitor:WRN Job: acceptLedger run: 1078ms wait: 1ms
июн 16 08:50:32 bank rippled[671]: 2019-Jun-16 05:50:32.182103193 LoadMonitor:WRN Job: processLedgerData run: 1965ms wait: 0ms
июн 16 08:50:34 bank rippled[671]: 2019-Jun-16 05:50:34.670534265 LoadMonitor:WRN Job: acceptLedger run: 1659ms wait: 0ms
июн 16 08:50:35 bank rippled[671]: 2019-Jun-16 05:50:35.080997341 LoadMonitor:WRN Job: advanceLedger run: 1172ms wait: 4ms
июн 16 08:50:35 bank rippled[671]: 2019-Jun-16 05:50:35.761887091 LoadMonitor:WRN Job: processLedgerData run: 2092ms wait: 0ms

I measured the internet speed with https://www.speedtest.net/, see the screenshot of the results.

dmitriano commented 5 years ago

@nbougalis For the last two days 'rippled' worked normally with the age in the rage [0, 8] and 21 peers, but with the same "lgrNotFound" error.

I am not sure if it matters or not, but today I noticed that validators expiration date is not in future, see the full output of ./rippled validators command. At the moment of running the command the server date/time was 2019-Jul-17 12:42:16 UTC, but the expiration date was 2019-Jul-17 00:00:00

dmitriano commented 5 years ago

I tried the same account_info command with s1.ripple.com and s2.ripple.com and got the same "lgrNotFound" error:

./rippled -v --rpc_ip 34.213.185.56:51234 account_info r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59 true

Loading: "/home/guber/.config/ripple/rippled.cfg"
2019-Jun-17 14:23:44.826270902 RPCParser:TRC Method: 'account_info'
2019-Jun-17 14:23:44.826345101 RPCParser:TRC Params: [ "r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59", "true" ]

2019-Jun-17 14:23:44.826364401 RPCParser:TRC RPC Request:
{
        "account" : "r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59",
        "ledger_index" : 0
}

2019-Jun-17 14:23:44.826431800 HTTPClient:NFO Connecting to 34.213.185.56:51234

2019-Jun-17 14:23:44.826492499 HTTPClient:TRC Fetch: 34.213.185.56
2019-Jun-17 14:23:44.826536799 HTTPClient:TRC expires_from_now: Success
2019-Jun-17 14:23:44.826550999 HTTPClient:TRC Resolving: 34.213.185.56
2019-Jun-17 14:23:44.826761896 HTTPClient:TRC Resolve complete.
2019-Jun-17 14:23:45.037269225 HTTPClient:TRC Connected.
2019-Jun-17 14:23:45.037315124 HTTPClient:TRC Session started.
2019-Jun-17 14:23:45.037321424 HTTPClient:DBG requestRPC: strPath=''
2019-Jun-17 14:23:45.037367524 HTTPClient:TRC Wrote.
2019-Jun-17 14:23:45.248605244 HTTPClient:TRC Header: "HTTP/1.1 200 OK
Date: Mon, 17 Jun 2019 14:23:45 +0000
Connection: Keep-Alive
Content-Length: 204
Content-Type: application/json; charset=UTF-8
Server: ripple-json-rpc/rippled-1.2.4

{"result":{"error":"lgrNotFound","error_code":21,"error_message":"ledgerNotFound","request":{"account":"r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59","command":"account_info","ledger_index":0},"status":"error"}}

"
2019-Jun-17 14:23:45.248702543 HTTPClient:DBG invokeComplete: Deadline popping: 1
2019-Jun-17 14:23:45.248726343 HTTPClient:DBG RPC reply: {"result":{"error":"lgrNotFound","error_code":21,"error_message":"ledgerNotFound","request":{"account":"r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59","command":"account_info","ledger_index":0},"status":"error"}}

2019-Jun-17 14:23:45.248752043 HTTPClient:TRC Deadline cancelled.
{
   "result" : {
      "error" : "lgrNotFound",
      "error_code" : 21,
      "error_message" : "ledgerNotFound",
      "request" : {
         "account" : "r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59",
         "command" : "account_info",
         "ledger_index" : 0
      },
      "status" : "error"
   }

But

./rippled -v --rpc_ip 34.213.185.56:51234 server_info

works fine, for example and I was able to run 'account_info' command with JSON-RPC using curl

What does this mean?

mDuo13 commented 5 years ago

I'm not sure why account_info would work over JSON-RPC with curl but not with the commandline. The commandline is just an interface to the JSON-RPC anyway. The only things I can think of are:

I suspect it's the former, in which case, that much is a known issue, but also... I guess one of your servers doesn't have the necessary ledger history?

dmitriano commented 5 years ago

@mDuo13 Looks like the command like arguments were wrong. If I replace 'true' with 'strict' it works:

./rippled account_info r9cZA1mLK5R5Am25ArfXFmqgNwjZgnfk59 strict

carlhua commented 3 years ago

Closing this issue.