erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.1k stars 1.08k forks source link

block files: split _dirtyFiles and _visibleFiles #11417

Closed AskAlexSharov closed 1 day ago

AskAlexSharov commented 1 month ago

Example from state files in E3:

// dirtyFiles - list of ALL files - including: un-indexed-yet, garbage, merged-into-bigger-one, ...
// thread-safe, but maybe need 1 RWLock for all trees in Aggregator
//
// _visibleFiles derivative from field `file`, but without garbage:
//  - no files with `canDelete=true`
//  - no overlaps
//  - no un-indexed files (`power-off` may happen between .ef and .efi creation)
//
// BeginRo() using _visibleFiles in zero-copy way
dirtyFiles *btree2.BTreeG[*filesItem]

// _visibleFiles - underscore in name means: don't use this field directly, use BeginFilesRo()
// underlying array is immutable - means it's ready for zero-copy use
_visibleFiles []ctxItem

Means all Open/Reopen/Build/Merge operations - are working with _dirtyFiles as much as they want, then recalcVisibleFiles() called once - which creating NEW list of visible files and saves in _visibleFiles, then all RPC requests just use list := _visibleFiles.Load() (mutex-free) and list guarantee to be always immutable.

func (a *Aggregator) OpenFolder() error {
    defer a.recalcVisibleFiles()

func (a *Aggregator) Close can just wait until all existing roTx are finish

From E3 docs:

## Work with files

### Consistent files view

Let's forget about `DB` and focus on files in this section.

`Aggregator` - like `RoDb` but on files. Responsible for creating/merging/deleting files (in the background).

`AggregatorRoTx` - like `RoTx` but for files. (I plan to rename it to `View`). Responsible for guaranteeing consistent view on files - even if in the background happens to merge/delete/new_file_create/etc... Only `AggregatorRoTx` has methods to read data from files. ["snapshots isolation" level](https://en.wikipedia.org/wiki/Snapshot_isolation)

`!` Each file has a `ref-counter`. `BeginFilesRo` does increment and `aggTx.Close` does decrement. If `aggTx.`Close` see the file marked as `readyForDelete` and 0 readers - it deletes the file.

`!` All (including not-indexed, overlapped, ready for delete) files are in `Domain.dirtyFiles` -> then `d.reCalcVisibleFiles()` called -> to produce `Domain.visibleFiles`. `visibleFiles` designed for zero-copy-use by `BeginFilesRo()` method - must be fast - called on every RPC request.

I think this task will solve next issue:

==================
WARNING: DATA RACE
Read at 0x00c080da3e68 by goroutine 1911:
  github.com/erigontech/erigon/turbo/snapshotsync/freezeblocks.(*BlockReader).HeaderByHash()
      /home/ubuntu/erigon/turbo/snapshotsync/freezeblocks/block_reader.go:416 +0x2d6
  github.com/erigontech/erigon/eth/protocols/eth.AnswerGetBlockHeadersQuery()
      /home/ubuntu/erigon/eth/protocols/eth/handlers.go:58 +0xfd4
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).getBlockHeaders66.func1()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:640 +0xc4
  github.com/erigontech/erigon-lib/kv/temporal.(*DB).View()
      /home/ubuntu/erigon/erigon-lib/kv/temporal/kv_temporal.go:103 +0x18d
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).getBlockHeaders66()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:639 +0x39a
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).handleInboundMessage()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:802 +0x13b
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).HandleInboundMessage()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:773 +0x128
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).HandleInboundMessage-fm()
      <autogenerated>:1 +0x6d
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.pumpStreamLoop[go.shape.*uint8].func2()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:247 +0x162

Previous write at 0x00c080da3e68 by goroutine 133:
  github.com/erigontech/erigon/turbo/snapshotsync/freezeblocks.(*Segment).closeIdx()
      /home/ubuntu/erigon/turbo/snapshotsync/freezeblocks/block_snapshots.go:154 +0x358
  github.com/erigontech/erigon/turbo/snapshotsync/freezeblocks.(*RoSnapshots).buildMissedIndices.func2()
      /home/ubuntu/erigon/turbo/snapshotsync/freezeblocks/block_snapshots.go:843 +0x31c
  github.com/tidwall/btree.(*Map[go.shape.int,go.shape.*uint8]).nodeScan()
      /home/ubuntu/go/pkg/mod/github.com/tidwall/btree@v1.6.0/map.go:279 +0x2f8
  github.com/tidwall/btree.(*Map[go.shape.int,go.shape.*uint8]).scan()
      /home/ubuntu/go/pkg/mod/github.com/tidwall/btree@v1.6.0/map.go:270 +0xa4
  github.com/tidwall/btree.(*Map[go.shape.int,go.shape.*uint8]).Scan()
      /home/ubuntu/go/pkg/mod/github.com/tidwall/btree@v1.6.0/map.go:259 +0x2b
  github.com/erigontech/erigon/turbo/snapshotsync/freezeblocks.(*RoSnapshots).buildMissedIndices()
      /home/ubuntu/erigon/turbo/snapshotsync/freezeblocks/block_snapshots.go:835 +0x6b7
  github.com/erigontech/erigon/turbo/snapshotsync/freezeblocks.(*RoSnapshots).buildMissedIndicesIfNeed()
      /home/ubuntu/erigon/turbo/snapshotsync/freezeblocks/block_snapshots.go:743 +0x21a
  github.com/erigontech/erigon/turbo/snapshotsync/freezeblocks.(*BlockRetire).BuildMissedIndicesIfNeed()
      /home/ubuntu/erigon/turbo/snapshotsync/freezeblocks/block_snapshots.go:1544 +0x1a7
  github.com/erigontech/erigon/eth/stagedsync.DownloadAndIndexSnapshotsIfNeed()
      /home/ubuntu/erigon/eth/stagedsync/stage_snapshots.go:284 +0x159e
  github.com/erigontech/erigon/eth/stagedsync.SpawnStageSnapshots()
      /home/ubuntu/erigon/eth/stagedsync/stage_snapshots.go:175 +0x1ce
  github.com/erigontech/erigon/eth/stagedsync.DefaultStages.func1()
      /home/ubuntu/erigon/eth/stagedsync/default_stages.go:49 +0x152
  github.com/erigontech/erigon/eth/stagedsync.(*Sync).runStage()
      /home/ubuntu/erigon/eth/stagedsync/sync.go:529 +0x285
  github.com/erigontech/erigon/eth/stagedsync.(*Sync).Run()
      /home/ubuntu/erigon/eth/stagedsync/sync.go:413 +0x593
  github.com/erigontech/erigon/turbo/stages.ProcessFrozenBlocks()
      /home/ubuntu/erigon/turbo/stages/stageloop.go:143 +0xcf
  github.com/erigontech/erigon/turbo/stages.StageLoop()
      /home/ubuntu/erigon/turbo/stages/stageloop.go:78 +0x116
  github.com/erigontech/erigon/eth.(*Ethereum).Start.gowrap2()
      /home/ubuntu/erigon/eth/backend.go:1559 +0x144

Goroutine 1911 (running) created at:
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.pumpStreamLoop[go.shape.*uint8]()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:241 +0x4aa
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.SentryReconnectAndPumpStreamLoop[go.shape.*uint8]()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:196 +0x8e7
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).RecvUploadHeadersMessageLoop()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:114 +0x297
  github.com/erigontech/erigon/p2p/sentry/sentry_multi_client.(*MultiClient).StartStreamLoops.gowrap3()
      /home/ubuntu/erigon/p2p/sentry/sentry_multi_client/sentry_multi_client.go:81 +0x6e

Goroutine 133 (running) created at:
  github.com/erigontech/erigon/eth.(*Ethereum).Start()
      /home/ubuntu/erigon/eth/backend.go:1559 +0xe1e
  github.com/erigontech/erigon/node.(*Node).Start()
      /home/ubuntu/erigon/node/node.go:129 +0x3b3
  github.com/erigontech/erigon/node.StartNode()
      /home/ubuntu/erigon/node/node.go:419 +0x2e
  github.com/erigontech/erigon/turbo/node.(*ErigonNode).run()
      /home/ubuntu/erigon/turbo/node/node.go:69 +0x9b
  github.com/erigontech/erigon/turbo/node.(*ErigonNode).Serve()
      /home/ubuntu/erigon/turbo/node/node.go:49 +0x86
  main.runErigon()
      /home/ubuntu/erigon/cmd/erigon/main.go:103 +0x685
  github.com/erigontech/erigon/turbo/app.MakeApp.func1()
      /home/ubuntu/erigon/turbo/app/make_app.go:71 +0x17e
  github.com/urfave/cli/v2.(*Command).Run()
      /home/ubuntu/go/pkg/mod/github.com/urfave/cli/v2@v2.27.2/command.go:276 +0x1578
  github.com/urfave/cli/v2.(*App).RunContext()
      /home/ubuntu/go/pkg/mod/github.com/urfave/cli/v2@v2.27.2/app.go:333 +0x1274
  github.com/urfave/cli/v2.(*App).Run()
      /home/ubuntu/go/pkg/mod/github.com/urfave/cli/v2@v2.27.2/app.go:307 +0xc8
  main.main()
      /home/ubuntu/erigon/cmd/erigon/main.go:51 +0x8d
==================
stevemilk commented 1 month ago

Read aggregator related files and find (maybe) a potential issue.

Here between if refCnt == 0 && src.canDelete.Load() and src.closeFilesAndRemove() can happen BeginFilesRo, which may cause nil-ptr.

        refCnt := src.refcount.Add(-1)
        //GC: last reader responsible to remove useles files: close it and delete
        if refCnt == 0 && src.canDelete.Load() {
            if traceFileLife != "" && tx.ap.filenameBase == traceFileLife {
                tx.ap.logger.Warn("[agg.dbg] real remove at AppendableRoTx.Close", "file", src.decompressor.FileName())
            }
            src.closeFilesAndRemove()
        }

I haven't read every detail so just suspect. Maybe it never happens.

Continue working on #11417...

AskAlexSharov commented 1 month ago

Maybe you right: assumption is - files with canDelete.True() will not be visible for new readers (see recalcVisibleFiles).

Anyway we need use atomic.Int and maybe runtime-check negative refcnt

AskAlexSharov commented 1 month ago

@stevemilk FYI: i have perf requirements for agg.BeginRo - it must be lock-free zero-allocation - because it’s what will be called in each RPC request. And we have tons of parallel RPS (throughput) - because “node providers” using us.

So, i believe in far future RoSnapshots (introduced in E2) also will have zero-alloc lock-free View method. (“lock-free” - atomics are fine, RwMutex - nope). It’s for future - not for now.

This is reason why I want separation of visible/dirty files - then write to visibleFiles will happen only in 1 atomic place: recalcVisibleFiles()

stevemilk commented 1 month ago

assumption is - files with canDelete.True() will not be visible for new readers (see recalcVisibleFiles).

mark canDelete as true and recalcVisibleFiles is not atomic, so the assumption could be broken in a small probability - corner case.

This is reason why I want separation of visible/dirty files - then write to visibleFiles will happen only in 1 atomic place: recalcVisibleFiles()

I totally agree with the design. For segments race issue, separation plus #11430 can make indexed segments append-only and avoid race. And for the above corner case, the complexity comes from showing small files that are going to be merged. Will you consider setting files to visible after files are merged/ no need to be merged ? If so, visible files become append-only.

AskAlexSharov commented 1 month ago

Files are visible after merged and indexed, and after _visible.Store(_dirty.calcVisibleFiles())

stevemilk commented 1 month ago

updates:

stevemilk commented 1 month ago

Hi sir @AskAlexSharov , PR is here ready for review. Please point out any areas that may be inadequately considered, thank you!