Xahau / xahaud

Codebase for Xahaud - The consensus, RPC & blockchain app for the Xahau network.
https://xahau.network
ISC License
23 stars 11 forks source link

Maximum directory limit hits when there are around 45k hook states. (Version: 2024.8.20-release+962 ) #356

Closed BimsaraFernando closed 2 weeks ago

BimsaraFernando commented 3 weeks ago

Issue Description

Steps to Reproduce

When there are more than 45k hook states, the Xahaud node prints the max directory limit error.

Expected Result

Hook states should be created since the maximum directory limit is greater than that.

Actual Result

xahaud node logs show the following error.

2024-Sep-03 15:07:31.552141610 UTC View:WRN HookError[TX:BCE1FE2410A1D712B4A138AD736658AA56AD672C4BB2E569E6B30BDFEC15F824]: SetHookState failed: 121 Key: 00000000CA180000000000003FA2C9F83E1D2E9CF57B40EAAF31C29F34DB668B Value: 470A000000000000

But the hook state count is checked at 2024-Sep-03 15:07:31.552141610 UTC, it shows only 45248 states.

Environment

Xahaud mainnet Account - rsfTBRAbD2bYjVuXhJ2RReQXxR4K5birVW

Supporting Files

image_2024_09_04T10_49_56_848Z

dangell7 commented 3 weeks ago

The hook::finalizeHookState here and here should have returned the error and not ignored it. The error was tecDIR_FULL. If it did return the error then the rest of the issues would not have occurred (We will fix this). However I'm still looking into why the tecDIR_FULL error was returned.

RichardAH commented 3 weeks ago

I suspect it was returned because it was at max pages, but the pages were mostly empty? Going to look into it more as soon as I can. Use FH to inspect directory nodes. https://richardah.github.io/xrpl-keylet-tools/

dangell7 commented 3 weeks ago

I suspect it was returned because it was at max pages, but the pages were mostly empty? Going to look into it more as soon as I can. Use FH to inspect directory nodes. https://richardah.github.io/xrpl-keylet-tools/

Many of the pages have only 11 entries, not the full 32.

RichardAH commented 3 weeks ago

It's caused by constant deleting and adding state to a namespace. Look at the first directory page:

{
  "result": {
    "index": "EBAFA545CDF3D15FB785F75CB57B1DF116F3252B42B96041377B02A9AC789D52",
    "ledger_hash": "476029E022F6F7A46972A540284A23AAA91E1FDC33D428E4FF7367A73E72833E",
    "ledger_index": 8248527,
    "node": {
      "Flags": 0,
      "IndexNext": "3d101",
      "IndexPrevious": "3ffff",
      "Indexes": [],
      "LedgerEntryType": "DirectoryNode",
      "Owner": "rsfTBRAbD2bYjVuXhJ2RReQXxR4K5birVW",
      "RootIndex": "EBAFA545CDF3D15FB785F75CB57B1DF116F3252B42B96041377B02A9AC789D52",
      "index": "EBAFA545CDF3D15FB785F75CB57B1DF116F3252B42B96041377B02A9AC789D52"
    },
    "validated": true
  },
  "status": "success",
  "type": "response"
}

Notice that the next page is not index 1, but rather index 250113. This is because all of the interceding pages have since been deleted, and the doubly linked list now goes from page 0 (which can't be deleted unless the dir is completely empty) to page 250113.

The rippled code is only checking the index number not how many pages there actually are. https://github.com/Xahau/xahaud/blob/833df20fce95493c72c161e44415b3f448351c86/src/ripple/ledger/impl/ApplyView.cpp#L95

In other words this is a pre-existing rippled bug that we've just stumbled upon due to heavy state use. There's a few ways to fix it but for now I will advise the Evernode programmers to periodically rotate namespaces.

WietseWind commented 3 weeks ago

Nice find 🫨

RichardAH commented 3 weeks ago

I think there's no reason to have this condition. The sfIndex fields are uint64s, so they can continue to increment essentially indefinitely. The amount of ledgers it would take to make 32 * 2**64 hook state creates and deletes would very likely exceed the lifespan of the human race.

dangell7 commented 2 weeks ago

https://github.com/Xahau/xahaud/pull/359