Optimize Trust Lines - Githubissues

mDuo13 commented 3 years ago

Summary

Looking at the numbers from nixerFFM's ledger data analysis, it seems like we could probably save a lot of space in the ledger by optimizing trust lines (RippleState objects). We could reduce trust lines' size significantly with just these optimizations:

Drop ACCOUNT_ONE from Balance
Store the currency code once per trust line instead of 3 times (Balance/LowLimit/HighLimit)
Make LowNode/HighNode fields optional (omit when their value would be 0, which is most common)
Don't store the high/low limit if it's 0.

Motivation

Total ledger size is a key constraint in scaling the XRP Ledger. To sync to the network, a server must download the entire latest state from its peers, an amount that is currently about 1 GB + overhead. This also affects bandwidth usage of servers in the peer-to-peer network, which is one of the more expensive and restrictive factors in running a reliable server, although state data probably contributes less to bandwidth than transactions themselves (aside: we should confirm this empirically). Furthermore, to be useful, a server should store ledger history, which includes all state data that has changed since the previous ledger version. (With de-duplication, storing 10 ledgers takes far less than 10× the space of storing 1 ledger, but the size of individual objects is still significant.)

RippleState objects (trust lines) account for 31% of the data in a given ledger (~350 MB), a surprising amount of which is unnecessary. Here's a breakdown of the size of a single RippleState object:

FIELD NAME          SIZE IN BITS
================================
LedgerEntryType               16
Flags                         32 ("optional" but present in > 99.99% of cases)
Balance                      384 (160–224 wasted)
LowLimit                     384 (160–224 wasted)
HighLimit                    384 (160–224 wasted)
PreviousTxnID                256
PreviousTxnLgrSeq             32
LowNode                       64 (often has value 0)
HighNode                      64 (often has value 0)
--------------------------------
REQUIRED SUBTOTAL:          1872 (234 bytes)

OPTIONAL FIELDS
--------------------------------
LowQualityIn                  32
LowQualityOut                 32
HighQualityIn                 32
HighQualityOut                32
--------------------------------
MAXIMUM TOTAL SIZE:         2000 (250 bytes)

A significant amount of space is wasted in this representation, especially the use of "Amount" type fields (384 bits each) for Balance, LowLimit, and HighLimit:

All three fields store the same currency code (160 bits each). We could save 320 bits by dropping two of the three.
The address ACCOUNT_ONE in the Balance field (160 bits) is meaningless.
The value (64 bits) of all three fields is included even when it's 0. I believe most trust lines are unidirectional, so we could save 64 bits or more per RippleState by omitting limit values of 0. (384 bits when all three—both limits and balance value—are 0)
The LowNode and HighNode fields are "hints" to the owner directory, with a value of 0 for any trust line that appears in the first page of its owner's directory—the most common case. (Accounts outnumber trust lines 3 million to 800k.) We could save 128 bits on most trust lines by omitting these fields when their value is 0.

That adds up to a potential savings of between 480 and 992 bits per RippleState entry, with my guess being that on average the savings would be least 608 bits (76 bytes) each. This would be offset slightly by additional 1-3 bytes per optional field (when present) to identify it.

As of a recent ledger version, there are 834479 RippleState entries, so average savings of 76 bytes each adds up to about 63 MB or about 5.5% of the ledger's total data. That's comparable in size to everything in the ledger that's not an account, owner directory, or trust line combined.

Solution

We should introduce an amendment (proposed name: OptimizeTrustLines) that modifies RippleState objects as follows:

Makes LowNode and HighNode are optional fields, to be omitted when their value is 0.
Balance, LowLimit and HighLimit fields are legacy and should be removed whenever a trust line updated. In their place, "new-style" trust lines have the following fields:
- Currency (internal type Hash160, 160 bits): the full currency code for this trust line.
- LineLowAccount (internal type AccountID, 160 bits): the low account
- LineHighAccount (internal type AccountID, 160 bits): the high account
- (Optional) LineLowLimit (64 bits): the low account's limit. Omitted if the limit is 0 (the default).
- (Optional) LineHighLimit (64 bits): the high account's limit. Omitted if the limit is 0 (the default).
- (Optional) LineBalance (64 bits): the current net balance of the trust line. Omitted if 0.

(Credit to @nbougalis for brainstorming a lot of this.)

Transactions would migrate trust lines to the new style whenever they modified them for any reason, so old style trust lines would continue to exist indefinitely if unmodified. Several transactors in the payment engine would have to be modified to do this and to use the old or new fields depending on what the transactions had available; but at least the new fields are 1:1 bit-identical to the pieces of the old fields that are currently in use. Transactors would also have to handle some more cases of fields that can be omitted if their value is 0.
API methods like account_lines would have to be updated to use the new fields (in addition to old fields) to return API responses. The actual response format for account_lines could remain the same.
Clients that fetch raw ledger data (such as through account_objects or ledger_entry) or read transaction metadata would have to update to be able to read the new fields. This is significant because it affects code that interprets balance changes of issued currencies. However, the delivered_amount field would be unchanged, so this would only affect clients doing some relatively detailed processing.

Since we can't realistically migrate old trust lines to the new format en masse, any space savings would be gradual and incremental, and would probably be more in the form of avoiding future storage increases rather than reducing the present storage needs.

Paths Not Taken

This proposal does not include any changes to the Flags field although I think, realistically, the No Ripple settings are pretty confusing and not that useful, and we should change that—but that's a different issue.

Nik originally suggested a 32-bit "Simple Asset Code" field to be used instead of the full 160-bit currency code for cases where the trust line is for a "standard" 3-character currency code, but I'm wary of making currency code data more confusing than it already is. Even though saving 16 bytes per object is significant, I think the fact that currency codes are always 160 bits under the surface is one of the rare cases of consistency in the XRP Ledger's protocol so I'd rather not ruin it. 😝

To reduce the integration burden for API clients, it would probably be possible to "fake" metadata in the old format. This sounds messy though and I worry the precedent could set us up for doing more of this kind of thing forever, which would make it really hard to introduce new features.

Another alternative would be to introduce an entirely new model of unidirectional trust lines linking back to a token definition object like the one proposed in #2609. While that might be more flexible and more powerful overall, it would be a much bigger migration that would require more action on the part of issuers, client apps, and so on. While the possibilities of a clean break and restart are tempting, I'm not convinced the legacy issued currency functionality is so broken as to warrant throwing it all out like this. Especially payment and offer processing (both highly complex, sensitive parts of the code) would probably require much more extensive rewrites to support such a change.

nixer89 commented 3 years ago

While I really like the idea, I'm a bit worried about tools and other applications working with trust lines. There will be a "hard cut" at some point in time, where the amendment gets enabled. (If validators agree to enable it) . But no one really can predict when this will be, except for the last two weeks. (And even then it could be vetoed still)

So, it might make sense, as you described above, to have the ability to receive RippleState objects still in the old format.

But the problem is that there are (at least 3?) Commands where you can receive a RippleState object with ( ledgerentry, ledger data, account_lines). And I agree that this would be a bit messy. At one point we would need to stop providing the old format or people would never switch over, using the new format for their tools.

But once the "fake old RippleState" is implemented, it will be hard to get rid of it.

MarkusTeufelberger commented 3 years ago

I'm not sure if there's much benefit to do this optimization throughout the whole system and even out to the API - it might be enough to only optimize the in-memory object in the most common cases.

cjcobb23 commented 3 years ago

Clients that fetch raw ledger data (such as through account_objects or ledger_entry) or read transaction metadata would have to update to be able to read the new fields. This is significant because it affects code that interprets balance changes of issued currencies. However, the delivered_amount field would be unchanged, so this would only affect clients doing some relatively detailed processing.

I think we can side step this, at least in the case where a user requests data in expanded JSON format (setting binary to false). We can just modify the output to be consistent with the old format, just like we plan to do for account_lines. For the case where binary is set to true, the situation is trickier. However, I don't think we want to modify the output in that situation; I think users that request binary data want to see that data exactly as it is on the ledger. We could just include some type of warning or flag that indicates "This object is a trustline in the new format" or some such.

I think it's ok that we would need to do the output modification in multiple places. We can just make a helper function that detects whether an object is a trustline, and do the modification if necessary. We could even embed the logic inside STObject::getJson().

I do worry though that this is overly complex, and might not be worth it, since the space savings would only be incremental.

nixer89 commented 2 years ago

I want to bring this topic up again. Some time has passed since June and we saw some "token craziness" recently (and still ongoing). The number of TrustLines has exploded, and so did the ledger size.

Currently, 2,3 million Trustlines account for 1 GB of ledger data. That is around 48% of the Ledger size! (https://xrpldata.com/api/v1/ledgerdata).

As you can see here:

Out of 2.3 million Trustlines, "only" 774k hold an actual Balance. All other Trustlines do not hold any value/token but add a big amount of data / size to the actual ledger. Maybe this could be a starting point to make some improvements. But as @mDuo13 already mentioned, there are many "duplicate" fields inside the RippleState object which could be omitted.

MarkusTeufelberger commented 2 years ago

The mechanism to make sure this doesn't get out of hand is the "Owner Reserve" by the way, and this was recently cut down by 60% (from 5 to 2 XRP).

This means the 834479 objects from the issue description were about as expensive as 2.09 million trustlines now. Since the 5 XRP price already was no deterrent to creating ~2 million trust lines, I would expect that this will become a much larger issue with the cheap price now.

intelliot commented 1 year ago

notes:

some community members have a concern about CheckCashMakesTrustline because of the trust line bloat issue.
trust lines comprise >56% of all ledger data.
in addition to the suggestions above, there may be other optimizations that reduce memory or storage usage without changing the "canonical" format.

mvadari commented 9 months ago

The STCurrency type introduced in #4789 likely makes this easier to do.

mvadari commented 5 months ago

As of ledger 87716591 (5/2/2024, 7:06:10 PM UTC), there were:

5,917,790 total trustlines
22 trustlines with a Flags value of 0
2,075,445 trustlines with a HighNode value of "0"
1,287,759 trustlines with a LowNode value of "0"
2,552,138 trustlines with a balance of 0
2,057 bidirectional trustlines (neither the LowLimit nor the HighLimit is 0)

Resulting savings:	Type	Count	Savings Each (bytes)
Total	5,917,790	55	325.5
Empty `Flags`	22	5	0.00011
Empty `HighNode`	2,075,445	9	18.68
Empty `LowNode`	1,287,759	9	11.59
0 Balance	2,552,138	9	22.97
Unidirectional	5,915,733	9	53.24

The total savings would be 432 MB, or 7.5% of the whole ledger.

XRPLF / rippled

Optimize Trust Lines #3866

Summary

Motivation

Solution

Paths Not Taken