XRPLF / rippled

Decentralized cryptocurrency blockchain daemon implementing the XRP Ledger protocol in C++
https://xrpl.org
ISC License
4.48k stars 1.45k forks source link

Address rare corruption of NFTokenPage linked list #4945

Open scottschurr opened 3 months ago

scottschurr commented 3 months ago

High Level Overview of Change

It was discovered that there are a couple of NFT directories on the MainNet that are missing links in the middle of the chain. This pull request does several things to address the corruption:

  1. The source of the corruption was identified and fixed, which should prevent the problem from happening in the future.
  2. A new transaction, LedgerStateFix, is added which allows any account on ledger to pay the fee to repair the links of any single account.
  3. More invariants are added which should prevent similar problems from happening in the future.
  4. And unit tests are added to exercise all of the above changes.

A new amendment, fixNFTokenPageLinks, includes all of these changes.

Details follow.

The Source of the Corruption

There are two different transactions that introduced the corruption into the two directories. In both cases the following conditions were met:

  1. There were at least two NFToken pages in the directory.
  2. The next-to-last page was completely full, holding 32 NFTokens.
  3. The very last page of the directory contained only one NFToken, and the transaction removed that last remaining token from the last page.

When these conditions were met, the last NFToken page was removed and the next-to-last page was left as the final page in the directory.

That would be fine except the NFToken directory has a an expectation that the last page has a specific index. The page with that index was just deleted. So when an NFToken is added to the directory, and that token has a high enough value that it does not belong on the new last page, then a new last page is created that has no links to the previous page.

Now there's a linked list with a hole in the middle of the list.

The fixNFTokenPageLinks amendment modifies the NFToken page coalescing code to notice when the very last page of the directory would be removed. In that case it simply moves all of the contents of the next lower page into the last page and deletes the next-to-last page (and fixes up the links).

New invariant checks also validate aspects of the links on pages, so a similar corruption should cause a tecINVARIANT_FAILED transaction result. That should prevent similar corruptions in the future.

The New Transaction: LedgerStateFix

Once the amendment goes live then that prevents any similar NFToken directory corruption in the future. But we are still left with two damaged NFToken directories. And potentially more that could be created between now and when the amendment goes live.

We could leave the corruptions. I don't like that idea for two reasons:

  1. These corruptions seriously inconvenience those accounts that suffered the corruption, through no fault of their own.
  2. When you drop food on the kitchen floor you clean it up. You don't put up a sign that says, please don't step here. You clean it up.

We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:

  1. As mentioned before, additional directories could face similar corruption between now and when the amendment goes live. We would like to be able to clean up all damaged NFToken directories, regardless of when they are created.
  2. The MainNet is not the only place running the XRP Ledger code. Other networks may have similar corruptions that need to be fixed. The approach to the fix should be applicable to other networks as well.

We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.

But we would also rather not introduce a new single-use transaction for this very tiny corner case.

So this approach creates a new transaction aimed at cleaning up messes in the ledger: LedgerStateFix. We'll use it for this situation. If there are other repairable corruptions in the future we can re-use this transaction to repair those future corruptions.

LedgerStateFix

LedgerStateFix fields (in addition to common fields) are: Field Name Data Type Required/Optional Description
TransactionType uint16 Required LedgerStateFix
Account STAccount Required Account that signs and pays fee
Fee STAmount Required Fee paid for transaction
Flags uint32 Optional Available for future fixes
LedgerFixType uint16 Required Only 1 is currently supported
Owner STAccount Optional Required if LedgerFixType == 1

TransactionType — identifies this as a LedgerStateFix transaction.

Account — identifies the account signing and submitting the transaction as well as paying the Fee.

Fee — this transaction is rare and potentially compute intensive. We want to discourage folks from spamming the ledger with these. So the minimum fee is the same as the fee for an AccountDelete transaction. And, yes, if the transaction fails with a tec code the fee is still charged.

Flags — not needed for LedgerFixType == 1. May be used for some unknown future ledger fix.

LedgerFixType — for now the only valid value is 1, which fixes the NFToken directory for a single account.

Owner — the account that owns the NFToken directory that needs fixing. Need not have any relationship to Account.

Fixing NFTokenPage Links

Once the amendment goes live, any corrupted NFTokenPage directory can be fixed by submitting a transaction like this:

{
   "Account" : "<Signer and fee payer>",
   "Fee" : "2000000",
   "LedgerFixType" : 1,
   "Owner" : "<Account with corrupted NFTokenPage directory",
   "Sequence" : <n>,
   "SigningPubKey" : "<Account's public key>",
   "TransactionType" : "LedgerStateFix",
   "TxnSignature" : "<Signature>",
}

If a repair is actually completed, then tesSUCCESS is returned. If there is an error or if the directory did not need repair then a tec code is returned.

Future Fixes

Unfortunately this is not the first time ledger state has needed tidying up. Amendment fixTrustLinesToSelf removed invalid state from the ledger. Since programmers are humans we can expect similar kinds of problems to show up in the future.

If there are future ledger state problems, they can be attached to the LedgerStateFix transaction by picking a LedgerFixType value other than 1. Any additional parameters can be added as optional values.

Context of Change

While I was research how efficient NFTokenPages actually are in use, I stumbled across two NFToken directories that were missing links in the middle of the chain. This is a form of ledger corruption. But its consequences do not seem too dire, even for the owners of those directories. Known consequences are:

  1. The account_objects RPC command returns only those objects at the beginning of the NFToken directory up to the point where a link is missing. So, for those accounts with missing links, the NFTokens returned by account_objects is incomplete.
  2. The account_nfts RPC command has a problem similar to account_objects. account_nfts returns NFTokens only up to the point where a link is missing.

There may be other consequences, but I've looked for and not found any. Particularly, I've not seen any evidence of lost NFTokens. NFTokens still get added to the correct owner's page. They simply are not reported by the RPC commands.

Type of Change

API Impact

Consideration

This NFT-related amendment arrives at about the same time as a different NFT-related amendment: https://github.com/XRPLF/rippled/pull/4946

Should these two pull requests be merged into a single amendment?

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 88.96714% with 94 lines in your changes are missing coverage. Please review.

Project coverage is 77.04%. Comparing base (c28e005) to head (eeb1098). Report is 8 commits behind head on develop.

Files Patch % Lines
src/test/app/NFTokenBurn_test.cpp 86.84% 19 Missing and 26 partials :warning:
src/test/app/FixNFTokenPageLinks_test.cpp 91.35% 10 Missing and 13 partials :warning:
src/ripple/app/tx/impl/details/NFTokenUtils.cpp 76.00% 14 Missing and 4 partials :warning:
src/ripple/app/tx/impl/LedgerStateFix.cpp 87.50% 3 Missing and 1 partial :warning:
src/ripple/app/tx/impl/InvariantCheck.cpp 85.00% 0 Missing and 3 partials :warning:
src/test/ledger/Invariants_test.cpp 99.04% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #4945 +/- ## ============================================ + Coverage 61.64% 77.04% +15.39% ============================================ Files 806 1131 +325 Lines 70967 132505 +61538 Branches 36690 39815 +3125 ============================================ + Hits 43745 102083 +58338 - Misses 19865 24527 +4662 + Partials 7357 5895 -1462 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

exunda commented 1 month ago

Why do you use the custom transaction method instead of doing this automatically which is what happens in activateTrustLinesToSelfFix which is called when the amendment is activated? Are you worried that more corrupted pages will create before the amendment activates?

scottschurr commented 4 weeks ago

@exunda asked:

Why do you use the custom transaction method instead of doing this automatically which is what happens in activateTrustLinesToSelfFix which is called when the amendment is activated?

That's a fair question. It was answered in the (admittedly long) description of the pull request. Here's the quote:

We could put in code that cleans up the two known corrupted directories when the amendment goes live. That's not a good approach for two reasons:

  1. As mentioned before, additional directories could face similar corruption between now and when the amendment goes live. We would like to be able to clean up all damaged NFToken directories, regardless of when they are created.
  2. The MainNet is not the only place running the XRP Ledger code. Other networks may have similar corruptions that need to be fixed. The approach to the fix should be applicable to other networks as well.

We need a transaction to clean up the directories. There are no transactions that just naturally walk an NFToken directory. So we can't leverage an existing transaction. There are RPC commands that walk the directory, but RPC commands can't modify ledger state.

Does that answer your question?

exunda commented 3 weeks ago

Yes thank you for explaining more