ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.11k stars 19.93k forks source link

Pebble compaction causes intermittent, but significant performance impacts #29575

Open riposteX opened 4 months ago

riposteX commented 4 months ago

System information

Geth version: v1.13.14 CL client & version: teku@24.1.3 OS & Version: Linux

Expected behaviour

Geth receives/processes blocks in a timely manner.

Actual behaviour

I run a number of geth/teku nodes and have recently noticed an infrequent (order of daily) pattern occurring across all of them where geth receives a burst of 6ish blocks around the same time, the oldest one, of course, being 72s stale.

Of course it could just be a temporary network issue, but I keep seeing this same number of blocks across multiple machines in multiple locations.

Could also be a teku issue, but seems a bit unlikely given the logs below.

Steps to reproduce the behaviour

I've added some custom logging in forkchoiceUpdated(): https://github.com/ethereum/go-ethereum/blob/823719b9e1b72174cd8245ae9e6f6f7d7072a8d6/eth/catalyst/api.go#L313

unixNow := uint64(time.Now().Unix())
blockHeader := block.Header()
if blockHeader.Time+8 < unixNow {
    fmt.Println("FCU received late block:", blockHeader.Number, unixNow-blockHeader.Time)
}

With this code in place, yesterday I got the output:

FCU received late block: 19680123 63
FCU received late block: 19680124 52
FCU received late block: 19680125 40
FCU received late block: 19680126 28
FCU received late block: 19680127 16

and corresponding Teku logs:

01:01:51.130 INFO  - ESC[37mSlot Event  *** Slot: 8882707, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 64ESC[0m
01:01:56.514 WARN  - ESC[33mExecution Client request timed out. Make sure the Execution Client is online and can respond to requests.ESC[0m
01:01:56.516 WARN  - ESC[33mLate Block Import *** Block: 719951b2cd1546b28744883f37736b44aac68a8db826a050eba30b0862a6ca17 (8882707) proposer 425691 arrival 1500ms, gossip_validation +7ms, pre-state_retrieved +4ms, processed +146ms, data_availability_checked +0ms, execution_payload_result_received +7857ms, begin_importing +0ms, completed +2ms
ESC[0m
01:02:03.105 INFO  - ESC[37mSlot Event  *** Slot: 8882708, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 63ESC[0m
01:02:15.065 INFO  - ESC[37mSlot Event  *** Slot: 8882709, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 63ESC[0m
01:02:15.094 INFO  - ESC[32mExecution Client is responding to requests again after a previous failureESC[0m
01:02:18.523 WARN  - ESC[33mExecution Client request timed out. Make sure the Execution Client is online and can respond to requests.ESC[0m
01:02:27.058 INFO  - ESC[37mSlot Event  *** Slot: 8882710, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 70ESC[0m
01:02:39.051 INFO  - ESC[37mSlot Event  *** Slot: 8882711, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 70ESC[0m
01:02:50.552 INFO  - ESC[32mExecution Client is responding to requests again after a previous failureESC[0m
01:02:51.110 INFO  - ESC[37mSlot Event  *** Slot: 8882712, Block:                                                        ... empty, Justified: 277583, Finalized: 277582, Peers: 69ESC[0m
01:03:03.347 INFO  - ESC[37mSlot Event  *** Slot: 8882713, Block: e9a5299cdef23abf523c27235560777a176a614341703654aec7b3eab504f99e, Justified: 277583, Finalized: 277582, Peers: 69ESC[0m

I'm interpreting this as something is infrequently hanging geth for ~1 min.

Some other recent incidents occurred at block 19678501 and 19675309. The signature is always a 6ish block pileup on geth and a late block message on Teku. These late block messages were a bit different from the above though:

12:51:51.989 WARN  - ESC[33mLate Block Import *** Block: b9b41da780df72f2b0d81e7e54820cfad87cf25107ba63f
de3e147606bdb76d6 (8877857) proposer 241202 arrival 972ms, gossip_validation +6ms, pre-state_retrieved +
10ms, processed +223ms, execution_payload_result_received +0ms, begin_importing +3778ms, completed +0ms

and

23:34:51.509 WARN  - ESC[33mLate Block Import *** Block: 23b5b12c8680fe980ee2f8bf8455b43b837181b9988fcf7
4d5bd73130b457057 (8881072) proposer 37541 arrival 488ms, gossip_validation +7ms, pre-state_retrieved +1
2ms, processed +163ms, execution_payload_result_received +0ms, begin_importing +3837ms, completed +2ms

I was running an older (and less verbose) version of Teku at the time, Lucas Saldanha from the Teku Discord told me that both of these blocks were late because of blob data unavailability.

karalabe commented 4 months ago

Could you share some logs from Geth when this happens?

riposteX commented 4 months ago

I usually run with --verbosity 1 so no past logs. Will try to capture another one.

riposteX commented 4 months ago

Seems to have magically stopped on its own, haven't seen the characteristic 5-6 block pattern for a while now. Will keep an eye out and reopen if it returns.

riposteX commented 3 months ago

Did some more digging, seems the issue is caused by pebble's database compaction.

Not sure how/why it gets triggered, but the result is ~1 min of slow block processing.

Some recent examples where I've timed InsertBlockWithoutSetHead in newPayload

NP received heavy block: 19858977 2.95236375s
NP received heavy block: 19858978 4.224653413s
NP received heavy block: 19858979 22.798706499s
NP received heavy block: 19858980 4.120076448s

NP received heavy block: 19859242 2.621948628s
NP received heavy block: 19859243 2.995023483s
NP received heavy block: 19859244 3.207645697s
NP received heavy block: 19859245 2.334093554s

NP received heavy block: 19860885 2.695685369s
NP received heavy block: 19860886 5.435836699s
NP received heavy block: 19860887 3.502931665s
NP received heavy block: 19860888 2.930462485s

Found the culprit by profiling during the "bad" times, and confirmed that compaction is indeed being triggered by adding logging here: https://github.com/ethereum/go-ethereum/blob/44a50c9f96386f44a8682d51cf7500044f6cbaea/ethdb/pebble/pebble.go#L97

Saw some earlier optimizations around compaction (https://github.com/ethereum/go-ethereum/pull/20130), not sure if there's anything more that can be done to smooth these out as well.

Skird commented 2 months ago

I think I also face this issue on larger scale trying to sync full archive node The pattern is that node imports blocks with a good pace for 10-15 minutes showing around 20-100 Mgas/sec After that block import slows down to roughly 1-2 blocks per 12 seconds showing around 1 Mgas/sec. This period goes on for roughly 30 minutes (it's around block 11M right now, might get even worse) then cycle repeats

I collected pprof CPU profile and during that slow period most time is spent on pebble.pickCompactionSeedFile() It's pretty annoying since it seems that neither disk nor CPU cores are fully utilized during this period, it seems that pebble does some work single-threadedly

In my case I wonder if there is any way to make compactions less aggressive? I'd rather sacrifice some disk space for non-compacted DB for longer time to achieve faster sync

Skird commented 1 month ago

Update for those facing the same issue

  1. Tried to run with DisableAutomaticCompactions: true set in pebble.Options. As expected, not a good idea, after ~1hr of work I face severe degradation, all db operations seem to hang. Alas I found no other more fine-grained compaction related options in pebble.
  2. But what I found to be actually heplful is manually compacting DB using geth db compact. It takes a few hours during which node is fully IO bound and after full compaction sync works without degradation (also almost IO bound + fair CPU load) for a longer time (about 24hr). I got stable ~100 Mgas/s and while node is not syncing during compaction overall it seems beneficial for faster sync. Not a viable solution if you want to track tip of the chain with minimal latency though
MariusVanDerWijden commented 1 month ago

There's not much we can do about performance degradation during compaction. Pebble has already much better compaction than LevelDB has, but there is a significant performance hit inherent in the operation (shuffling the data around on the disk). The thing thats curious to me is that the compaction seems to be single threaded which is weird since Pebble claims to have concurrent database compaction afair. cc @rjl493456442

Skird commented 1 month ago

Sharing typical pprof profile when pebble seems to be CPU bound in one thread. My experience is that background compaction is somehow less efficient than calling db.Compact(nil, nil) as geth db compact does as the latter utilizes much more CPU and IO. I understand that probably not much geth can do about it, just sharing some practically acquired knowledge for posterity :)

pprof.geth.samples.cpu.007.pb.gz