XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.51k stars 1.46k forks source link

Optimize Trust Lines #3866

Open mDuo13 opened 3 years ago

mDuo13 commented 3 years ago

Summary

Looking at the numbers from nixerFFM's ledger data analysis, it seems like we could probably save a lot of space in the ledger by optimizing trust lines (RippleState objects). We could reduce trust lines' size significantly with just these optimizations:

Motivation

Total ledger size is a key constraint in scaling the XRP Ledger. To sync to the network, a server must download the entire latest state from its peers, an amount that is currently about 1 GB + overhead. This also affects bandwidth usage of servers in the peer-to-peer network, which is one of the more expensive and restrictive factors in running a reliable server, although state data probably contributes less to bandwidth than transactions themselves (aside: we should confirm this empirically). Furthermore, to be useful, a server should store ledger history, which includes all state data that has changed since the previous ledger version. (With de-duplication, storing 10 ledgers takes far less than 10× the space of storing 1 ledger, but the size of individual objects is still significant.)

RippleState objects (trust lines) account for 31% of the data in a given ledger (~350 MB), a surprising amount of which is unnecessary. Here's a breakdown of the size of a single RippleState object:

FIELD NAME          SIZE IN BITS
================================
LedgerEntryType               16
Flags                         32 ("optional" but present in > 99.99% of cases)
Balance                      384 (160–224 wasted)
LowLimit                     384 (160–224 wasted)
HighLimit                    384 (160–224 wasted)
PreviousTxnID                256
PreviousTxnLgrSeq             32
LowNode                       64 (often has value 0)
HighNode                      64 (often has value 0)
--------------------------------
REQUIRED SUBTOTAL:          1872 (234 bytes)

OPTIONAL FIELDS
--------------------------------
LowQualityIn                  32
LowQualityOut                 32
HighQualityIn                 32
HighQualityOut                32
--------------------------------
MAXIMUM TOTAL SIZE:         2000 (250 bytes)

A significant amount of space is wasted in this representation, especially the use of "Amount" type fields (384 bits each) for Balance, LowLimit, and HighLimit:

That adds up to a potential savings of between 480 and 992 bits per RippleState entry, with my guess being that on average the savings would be least 608 bits (76 bytes) each. This would be offset slightly by additional 1-3 bytes per optional field (when present) to identify it.

As of a recent ledger version, there are 834479 RippleState entries, so average savings of 76 bytes each adds up to about 63 MB or about 5.5% of the ledger's total data. That's comparable in size to everything in the ledger that's not an account, owner directory, or trust line combined.

Solution

We should introduce an amendment (proposed name: OptimizeTrustLines) that modifies RippleState objects as follows:

(Credit to @nbougalis for brainstorming a lot of this.)

Since we can't realistically migrate old trust lines to the new format en masse, any space savings would be gradual and incremental, and would probably be more in the form of avoiding future storage increases rather than reducing the present storage needs.

Paths Not Taken

This proposal does not include any changes to the Flags field although I think, realistically, the No Ripple settings are pretty confusing and not that useful, and we should change that—but that's a different issue.

Nik originally suggested a 32-bit "Simple Asset Code" field to be used instead of the full 160-bit currency code for cases where the trust line is for a "standard" 3-character currency code, but I'm wary of making currency code data more confusing than it already is. Even though saving 16 bytes per object is significant, I think the fact that currency codes are always 160 bits under the surface is one of the rare cases of consistency in the XRP Ledger's protocol so I'd rather not ruin it. 😝

To reduce the integration burden for API clients, it would probably be possible to "fake" metadata in the old format. This sounds messy though and I worry the precedent could set us up for doing more of this kind of thing forever, which would make it really hard to introduce new features.

Another alternative would be to introduce an entirely new model of unidirectional trust lines linking back to a token definition object like the one proposed in #2609. While that might be more flexible and more powerful overall, it would be a much bigger migration that would require more action on the part of issuers, client apps, and so on. While the possibilities of a clean break and restart are tempting, I'm not convinced the legacy issued currency functionality is so broken as to warrant throwing it all out like this. Especially payment and offer processing (both highly complex, sensitive parts of the code) would probably require much more extensive rewrites to support such a change.

nixer89 commented 3 years ago

While I really like the idea, I'm a bit worried about tools and other applications working with trust lines. There will be a "hard cut" at some point in time, where the amendment gets enabled. (If validators agree to enable it) . But no one really can predict when this will be, except for the last two weeks. (And even then it could be vetoed still)

So, it might make sense, as you described above, to have the ability to receive RippleState objects still in the old format.

But the problem is that there are (at least 3?) Commands where you can receive a RippleState object with ( ledgerentry, ledger data, account_lines). And I agree that this would be a bit messy. At one point we would need to stop providing the old format or people would never switch over, using the new format for their tools.

But once the "fake old RippleState" is implemented, it will be hard to get rid of it.

MarkusTeufelberger commented 3 years ago

I'm not sure if there's much benefit to do this optimization throughout the whole system and even out to the API - it might be enough to only optimize the in-memory object in the most common cases.

cjcobb23 commented 3 years ago

Clients that fetch raw ledger data (such as through account_objects or ledger_entry) or read transaction metadata would have to update to be able to read the new fields. This is significant because it affects code that interprets balance changes of issued currencies. However, the delivered_amount field would be unchanged, so this would only affect clients doing some relatively detailed processing.

I think we can side step this, at least in the case where a user requests data in expanded JSON format (setting binary to false). We can just modify the output to be consistent with the old format, just like we plan to do for account_lines. For the case where binary is set to true, the situation is trickier. However, I don't think we want to modify the output in that situation; I think users that request binary data want to see that data exactly as it is on the ledger. We could just include some type of warning or flag that indicates "This object is a trustline in the new format" or some such.

I think it's ok that we would need to do the output modification in multiple places. We can just make a helper function that detects whether an object is a trustline, and do the modification if necessary. We could even embed the logic inside STObject::getJson().

I do worry though that this is overly complex, and might not be worth it, since the space savings would only be incremental.

nixer89 commented 2 years ago

I want to bring this topic up again. Some time has passed since June and we saw some "token craziness" recently (and still ongoing). The number of TrustLines has exploded, and so did the ledger size.

Currently, 2,3 million Trustlines account for 1 GB of ledger data. That is around 48% of the Ledger size! (https://xrpldata.com/api/v1/ledgerdata).

As you can see here:

image

Out of 2.3 million Trustlines, "only" 774k hold an actual Balance. All other Trustlines do not hold any value/token but add a big amount of data / size to the actual ledger. Maybe this could be a starting point to make some improvements. But as @mDuo13 already mentioned, there are many "duplicate" fields inside the RippleState object which could be omitted.

MarkusTeufelberger commented 2 years ago

The mechanism to make sure this doesn't get out of hand is the "Owner Reserve" by the way, and this was recently cut down by 60% (from 5 to 2 XRP).

This means the 834479 objects from the issue description were about as expensive as 2.09 million trustlines now. Since the 5 XRP price already was no deterrent to creating ~2 million trust lines, I would expect that this will become a much larger issue with the cheap price now.

intelliot commented 1 year ago

notes:

mvadari commented 9 months ago

The STCurrency type introduced in #4789 likely makes this easier to do.

mvadari commented 5 months ago

As of ledger 87716591 (5/2/2024, 7:06:10 PM UTC), there were:

Resulting savings: Type Count Savings Each (bytes) Total Savings (MB)
Total 5,917,790 55 325.5
Empty Flags 22 5 0.00011
Empty HighNode 2,075,445 9 18.68
Empty LowNode 1,287,759 9 11.59
0 Balance 2,552,138 9 22.97
Unidirectional 5,915,733 9 53.24

The total savings would be 432 MB, or 7.5% of the whole ledger.