XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.52k stars 1.47k forks source link

Reporting Mode Dependencies #3776

Open treisto opened 3 years ago

treisto commented 3 years ago

Hi, what about adding "autoconf" to the list of prerequisites here?

Use apt-get to install the dependencies provided by the distribution

$ apt-get update
$ apt-get install -y gcc g++ wget git cmake pkg-config protobuf-compiler libprotobuf-dev libssl-dev

While compiling v1.7 for ubuntu I got this error:

Scanning dependencies of target krb5_src
[ 29%] Performing configure step for 'krb5_src'
/bin/sh: 1: **autoreconf: not found**
make[2]: *** [CMakeFiles/krb5_src.dir/build.make:108: ../.nih_c/unix_makefiles/GNU_9.3.0/Release/src/krb5_src-stamp/krb5_src-configure] Error 127
make[1]: *** [CMakeFiles/Makefile2:485: CMakeFiles/krb5_src.dir/all] Error 2

Which error was solved by installing the below: sudo apt-get install autoconf

Best, Lucian

MarkusTeufelberger commented 3 years ago

Which distribution is this? Please post the output of cat /etc/os-release

treisto commented 3 years ago

NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

cjcobb23 commented 3 years ago

@treisto Somehow you are trying to build the software in a special mode called reporting mode. You are right that you need autoconf for that, as well as a few other things. Did you configure cmake with -Dreporting=ON? It's supposed to default to off. Regardless, to get around this, reconfigure cmake with -Dreporting=OFF and rebuild. Or, if you do want to build reporting mode, install autoconf, as well as bison and flex.

treisto commented 3 years ago

Yep, you are right. "Reporting" was intentionally switched to ON, and indeed bison and flex were also needed.

cjcobb23 commented 3 years ago

Great. Just curious, are you trying to run the software in reporting mode? It's a very new feature, and unfortunately we are lacking some documentation for it, but I'd love to help you if you are trying to test it out.

treisto commented 3 years ago

Yes, thank you very much, it'd be great if you could. I indeed was wondering where could I get some more info about this. What is it intended to do, more specifically, how does it work, how to use it, for example

cjcobb23 commented 3 years ago

We are working on generating detailed documentation right now. I can give you a quick how-to right here though:

  1. Install and run PostgreSQL on your machine. Create a user/role and a database. Here are instructions for Ubuntu 20.04, but it might be slightly different on other platforms: https://www.digitalocean.com/community/tutorials/how-to-install-postgresql-on-ubuntu-20-04-quickstart
  2. Copy your rippled.cfg file so that you have two (currently identical) config files. You are going to run two rippleds on one machine.
  3. Change the [node_db] section in the second config file, setting type to NuDB and database_path to something different than the database path in the first config file. Delete any online_delete config info, if present.
  4. Add a port_grpc section to your first config file. The section should look like this:
    [port_grpc]
    ip = 0.0.0.0
    port = 50051
    secure_gateway = 127.0.0.1
  5. If you don't have one already, add a port to listen for unencrypted websocket connections in the first config file. Be sure to also add the section name to the server section (you can keep whatever is already there as well):
    
    [server]
    port_ws_admin_local

[port_ws_admin_local] port = 6006 ip = 127.0.0.1 admin = 127.0.0.1 protocol = ws


6. Add the following sections to your **second** config file:

[reporting] etl_source

[etl_source] source_ip=127.0.0.1 source_ws_port=6006 source_grpc_port=50051

Make sure `source_grpc_port` matches the port used in step 5, and `source_ws_port` matches the port used in step 6.
7. Add the following section to the **second** config file:

[ledger_tx_tables] conninfo = postgres://[username]:[password]@127.0.0.1/[database]


Fill in the appropriate parts of the above string with the user, password and database you created when setting up PostgreSQL in step 1.
8. Change any port numbers in the **second** config file to avoid any conflicts with the **first** config file (each rippled needs to use different ports).
9. Change the `[debug_logfile]` section in the **second** config file so it doesn't conflict with the first.
9. Build rippled in reporting mode. Launch one rippled using the **first** config:
`./rippled --conf=first.cfg`. Launch the second using the **second** config `./rippled --conf=second.cfg`

The second rippled will be running in reporting mode. These two servers talk to each other. The server running in reporting mode is optimized for RPC queries. It uses PostgreSQL instead of SQLite, and guarantees that there are no gaps in history. You can also use Apache Cassandra instead of NuDB for reporting mode servers, but that is intended to be used by organizations running public nodes that receive a lot of RPC traffic.

Currently, we are the in the first iteration of this project. Subsequent iterations will bring even more optimizations to the RPC system.

Let me know if you have any more questions.
cjcobb23 commented 3 years ago

If you use Cassandra, you can run multiple reporting servers that share access to the same database.

treisto commented 3 years ago

Wow! Thank you so much for all these details! If you'd be so kind, I would still have a few questions indeed, please.

Thank you very much once again!

cjcobb23 commented 3 years ago

I'd be happy to provide more details. Hopefully soon we can have all of this information in a blog post or something. By the way, we are calling stock nodes and validators p2p nodes now.

There are two main reasons for creating this new mode:

A server running in reporting mode is neither a validator nor a stock node. It does not connect to the p2p network, participate in consensus or apply transactions. Instead, the reporting node connects to a p2p node (a stock node or a validator), and extracts transactions as well as their computed result from the p2p node. It then writes the transactions and results to the database. A reporting node only extracts validated data; it has no knowledge of the open ledger or transactions that have not been validated. For RPCs that require access to the open ledger, such as fee or submit, the reporting node forwards the request to a p2p node. Lastly, a reporting node does not ever have ledger gaps.

[ledger_tx_tables] is just configuration information about the transaction database. The transaction database is a relational database that supports tx and account_tx. In reporting mode, this is implemented in PostgreSQL, but a p2p node uses SQLite for this. There is an additional parameter called use_tx_tables in that section. When running in p2p mode, setting this parameter to 0 (it defaults to 1) disables writing to the transaction database, which improves throughput but disables the tx and account_tx RPC. This is only for p2p mode though; reporting mode always writes to the transaction database, and will error out if you try to set this to 0.

Running on the same machine is not preferred, but can be a simple way to test things. Generally, you would want to run these on separate machines. You just would update the appropriate IPs in the config file, and everything should work fine. Running on different machines is preferred, as then the two processes are not competing for the same hardware. You can pummel your reporting node with RPC traffic without ever affecting your p2p node, and thus your ability to stay synced. The two processes might have different hardware requirements as well. A typical setup would be to keep minimal history on the p2p node, thus alleviating the need for a large SSD, and to keep large amounts of history on the reporting node.

The first server is a p2p node. This could be a validator, or just a stock node. The reporting node continuously extracts data from the p2p node via protobuf via gRPC. Specifically, each time a ledger is validated, the reporting node extracts all of the transactions from that ledger, as well as the ledger state diff between this ledger and the previous. This diff is every object that was created, modified or deleted as part of building the new ledger. Here is the exact protobuf request and respond: https://github.com/ripple/rippled/blob/develop/src/ripple/proto/org/xrpl/rpc/v1/get_ledger.proto

As I said, right now we have two different modes for rippled, but the long term goal is to move reporting mode to it's own repo, along with most of the public RPC handlers. We are also working on replacing the SHAMap in reporting mode, with a more time and space efficient data structure that is optimized for reporting. Once we do this, we are also looking to add support for PostgreSQL as the database to store ledger state data. Right now, PostgreSQL is only used for relational things, like account_tx, whereas ledger state data must be stored in either NuDB or Cassandra when running in reporting mode. Adding support for storing ledger state data in PostgreSQL will give users two options for a shared, network accessible backend for reporting mode.

Does this all make sense? Thanks for the good questions, and I'll be happy to answer anymore.

treisto commented 3 years ago

Yep it really does make sense to decouple these. Thank you so much for the detailed explanations! Just 2 more questions, please, hopefully last :)

cjcobb23 commented 3 years ago
treisto commented 3 years ago

Got it. Many thanks once again, and a great day!

yxxyun commented 3 years ago

When removed the deprecated API from the p2p node, can we still get the "open/closed" ledger data or subscribe the "proposed" transactions?

cjcobb23 commented 3 years ago

When removed the deprecated API from the p2p node, can we still get the "open/closed" ledger data or subscribe the "proposed" transactions?

Yes all of that will stay, in one way or another. You might have to issue those commands through the reporting node, but they will still be available. The design on this is not fully finalized yet.