bnb-chain / op-geth

GNU Lesser General Public License v3.0
58 stars 47 forks source link

fix: ignore truncation target range as flush not operated on time #131

Closed krish-nr closed 3 months ago

krish-nr commented 3 months ago

Description

ignore truncation target range as flush not operated on time

Rationale

bufferlist meet an update failure when unclean shutdown happens and lead to a panic, ignore the range judgement in this scenario

Example

N/A

Changes

will-2012 commented 3 months ago

https://github.com/bnb-chain/op-geth/blob/887404faf9cb5d5c0cf3b3a974889a71d20436f2/trie/triedb/pathdb/disklayer.go#L332-L342

==>

                if _, ok := dl.buffer.(*nodebufferlist); ok {
            persistentID := rawdb.ReadPersistentStateID(dl.db.diskdb)
            if limit >= persistentID {
                log.Info("No prune ancient under nodebufferlist, less than db config state history limit", "persistent_id", persistentID, "limit", limit)
                return ndl, nil
            }
            targetOldest := persistentID - limit + 1
            realOldest, err := dl.db.freezer.Tail()
            if err == nil && targetOldest <= realOldest {
                log.Info("No prune ancient under nodebufferlist due to truncate oldest less than real oldest, which maybe happened in abnormal restart",
                    "tartget_oldest_id", targetOldest, "real_oldest_id", realOldest, "error", err)
                return ndl, nil
            }
            oldest = targetOldest
            log.Info("Forcing prune ancient under nodebufferlist", "disk_persistent_state_id",
                persistentID, "truncate_tail", oldest)
        }

maybe better.

The problem probably occurs because there may be a gap between write wal and write stateid. Write wal occurs during commit to disklayer, and write stateid occurs during disklayer background flush. Therefore, stateid (ReadPersistentStateID) may be smaller than the actual wal head, and stateid - limit may be smaller than the actual wal tail.

will-2012 commented 3 months ago

Pls refine PR title and description. as unclean shutdown happens is misleading.