Open treisto opened 3 years ago
Which distribution is this? Please post the output of cat /etc/os-release
NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
@treisto Somehow you are trying to build the software in a special mode called reporting mode. You are right that you need autoconf
for that, as well as a few other things. Did you configure cmake with -Dreporting=ON
? It's supposed to default to off. Regardless, to get around this, reconfigure cmake with -Dreporting=OFF
and rebuild. Or, if you do want to build reporting mode, install autoconf
, as well as bison
and flex
.
Yep, you are right. "Reporting" was intentionally switched to ON, and indeed bison and flex were also needed.
Great. Just curious, are you trying to run the software in reporting mode? It's a very new feature, and unfortunately we are lacking some documentation for it, but I'd love to help you if you are trying to test it out.
Yes, thank you very much, it'd be great if you could. I indeed was wondering where could I get some more info about this. What is it intended to do, more specifically, how does it work, how to use it, for example
We are working on generating detailed documentation right now. I can give you a quick how-to right here though:
[node_db]
section in the second config file, setting type
to NuDB
and database_path
to something different than the database path in the first config file. Delete any online_delete
config info, if present.port_grpc
section to your first config file. The section should look like this:
[port_grpc]
ip = 0.0.0.0
port = 50051
secure_gateway = 127.0.0.1
server
section (you can keep whatever is already there as well):
[server]
port_ws_admin_local
[port_ws_admin_local] port = 6006 ip = 127.0.0.1 admin = 127.0.0.1 protocol = ws
6. Add the following sections to your **second** config file:
[reporting] etl_source
[etl_source] source_ip=127.0.0.1 source_ws_port=6006 source_grpc_port=50051
Make sure `source_grpc_port` matches the port used in step 5, and `source_ws_port` matches the port used in step 6.
7. Add the following section to the **second** config file:
[ledger_tx_tables] conninfo = postgres://[username]:[password]@127.0.0.1/[database]
Fill in the appropriate parts of the above string with the user, password and database you created when setting up PostgreSQL in step 1.
8. Change any port numbers in the **second** config file to avoid any conflicts with the **first** config file (each rippled needs to use different ports).
9. Change the `[debug_logfile]` section in the **second** config file so it doesn't conflict with the first.
9. Build rippled in reporting mode. Launch one rippled using the **first** config:
`./rippled --conf=first.cfg`. Launch the second using the **second** config `./rippled --conf=second.cfg`
The second rippled will be running in reporting mode. These two servers talk to each other. The server running in reporting mode is optimized for RPC queries. It uses PostgreSQL instead of SQLite, and guarantees that there are no gaps in history. You can also use Apache Cassandra instead of NuDB for reporting mode servers, but that is intended to be used by organizations running public nodes that receive a lot of RPC traffic.
Currently, we are the in the first iteration of this project. Subsequent iterations will bring even more optimizations to the RPC system.
Let me know if you have any more questions.
If you use Cassandra, you can run multiple reporting servers that share access to the same database.
Wow! Thank you so much for all these details! If you'd be so kind, I would still have a few questions indeed, please.
Thank you very much once again!
I'd be happy to provide more details. Hopefully soon we can have all of this information in a blog post or something. By the way, we are calling stock nodes and validators p2p nodes now.
There are two main reasons for creating this new mode:
A server running in reporting mode is neither a validator nor a stock node. It does not connect to the p2p network, participate in consensus or apply transactions. Instead, the reporting node connects to a p2p node (a stock node or a validator), and extracts transactions as well as their computed result from the p2p node. It then writes the transactions and results to the database. A reporting node only extracts validated data; it has no knowledge of the open ledger or transactions that have not been validated. For RPCs that require access to the open ledger, such as fee
or submit
, the reporting node forwards the request to a p2p node. Lastly, a reporting node does not ever have ledger gaps.
[ledger_tx_tables]
is just configuration information about the transaction database. The transaction database is a relational database that supports tx
and account_tx
. In reporting mode, this is implemented in PostgreSQL, but a p2p node uses SQLite for this. There is an additional parameter called use_tx_tables
in that section. When running in p2p mode, setting this parameter to 0
(it defaults to 1
) disables writing to the transaction database, which improves throughput but disables the tx
and account_tx
RPC. This is only for p2p mode though; reporting mode always writes to the transaction database, and will error out if you try to set this to 0
.
Running on the same machine is not preferred, but can be a simple way to test things. Generally, you would want to run these on separate machines. You just would update the appropriate IPs in the config file, and everything should work fine. Running on different machines is preferred, as then the two processes are not competing for the same hardware. You can pummel your reporting node with RPC traffic without ever affecting your p2p node, and thus your ability to stay synced. The two processes might have different hardware requirements as well. A typical setup would be to keep minimal history on the p2p node, thus alleviating the need for a large SSD, and to keep large amounts of history on the reporting node.
The first server is a p2p node. This could be a validator, or just a stock node. The reporting node continuously extracts data from the p2p node via protobuf via gRPC. Specifically, each time a ledger is validated, the reporting node extracts all of the transactions from that ledger, as well as the ledger state diff between this ledger and the previous. This diff is every object that was created, modified or deleted as part of building the new ledger. Here is the exact protobuf request and respond: https://github.com/ripple/rippled/blob/develop/src/ripple/proto/org/xrpl/rpc/v1/get_ledger.proto
As I said, right now we have two different modes for rippled, but the long term goal is to move reporting mode to it's own repo, along with most of the public RPC handlers. We are also working on replacing the SHAMap in reporting mode, with a more time and space efficient data structure that is optimized for reporting. Once we do this, we are also looking to add support for PostgreSQL as the database to store ledger state data. Right now, PostgreSQL is only used for relational things, like account_tx
, whereas ledger state data must be stored in either NuDB or Cassandra when running in reporting mode. Adding support for storing ledger state data in PostgreSQL will give users two options for a shared, network accessible backend for reporting mode.
Does this all make sense? Thanks for the good questions, and I'll be happy to answer anymore.
Yep it really does make sense to decouple these. Thank you so much for the detailed explanations! Just 2 more questions, please, hopefully last :)
tx
, account_tx
, ledger
, account_info
, book_offers
, etc will be handled only be a reporting node. But even then, the p2p node will still do all of the things you mentioned, except maybe store full history. While it will be possible to store full history on the p2p node after the full migration, you won't be able to really query it. You would need a reporting node to ingest that data to be able to query it. There will probably be a very minimal, binary only interface to the history data still, but nothing like the full fledged API we have now. But right now, there are no changes to the p2p node.Got it. Many thanks once again, and a great day!
When removed the deprecated API from the p2p node, can we still get the "open/closed" ledger data or subscribe the "proposed" transactions?
When removed the deprecated API from the p2p node, can we still get the "open/closed" ledger data or subscribe the "proposed" transactions?
Yes all of that will stay, in one way or another. You might have to issue those commands through the reporting node, but they will still be available. The design on this is not fully finalized yet.
Hi, what about adding "autoconf" to the list of prerequisites here?
While compiling v1.7 for ubuntu I got this error:
Which error was solved by installing the below:
sudo apt-get install autoconf
Best, Lucian