attestantio / go-eth2-client

Apache License 2.0
102 stars 59 forks source link

Optional BeaconState usage in `Validators` #105

Closed moshe-blox closed 3 months ago

moshe-blox commented 5 months ago

Continuing from https://github.com/attestantio/go-eth2-client/pull/100, we've found that BeaconState calls in mainnet allocate ~1170 MiB memory and require ~520 MiB memory from the OS to execute.

Our mainnet SSV nodes on Kubernetes now request up to 2 GB memory (in spikes), when they previously only needed up to 0.7 GB, yet they're still querying the same number of validators (~3000).

I'm not ruling out potential optimizations which can be done in validatorsFromState, yet I believe allowing go-eth2-client users to opt-out of BeaconState calls for slower operation in return for less memory usage is a reasonable trade-off nonetheless.

Reproducible:

package main

import (
    "context"
    "fmt"
    "log"
    "runtime"
    "time"

    eth2client "github.com/attestantio/go-eth2-client"
    "github.com/attestantio/go-eth2-client/api"
    "github.com/attestantio/go-eth2-client/auto"
    "github.com/attestantio/go-eth2-client/spec/phase0"
    "github.com/rs/zerolog"
)

func main() {
    start := time.Now()
    ctx := context.Background()
    client, err := auto.New(
        ctx,
        auto.WithLogLevel(zerolog.DebugLevel),
        auto.WithAddress("http://localhost:5052"),
    )
    if err != nil {
        panic(err)
    }
    log.Printf("Connected within: %s", time.Since(start))

    start = time.Now()
    indices := []phase0.ValidatorIndex{}
    for i := 0; i < 3466; i++ {
        indices = append(indices, phase0.ValidatorIndex(i))
    }
    validators, err := client.(eth2client.ValidatorsProvider).Validators(ctx, &api.ValidatorsOpts{
        State:   "head",
        Indices: indices,
    })
    if err != nil {
        panic(err)
    }
    log.Printf("Validators: %d", len(validators.Data))
    log.Printf("Retrieved validators within: %s", time.Since(start))

    printMemUsg()
    runtime.GC()
    time.Sleep(10 * time.Second)
    printMemUsg()

    fmt.Println(len(validators.Data))
}

func printMemUsg() {
    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    log.Printf("Alloc = %.2f MiB", bToMb(m.Alloc))
    log.Printf("TotalAlloc = %.2f MiB", bToMb(m.TotalAlloc))
    log.Printf("HeapIdle = %.2f MiB", bToMb(m.HeapIdle))
    log.Printf("HeapInuse = %.2f MiB", bToMb(m.HeapInuse))
    log.Printf("Sys = %.2f MiB", bToMb(m.Sys))
    fmt.Println()
}

func bToMb(b uint64) float64 {
    return float64(b) / 1024 / 1024
}

Output on a Linux machine with a mainnet Beacon node:

2024/02/04 09:27:20 Connected within: 7.059233ms
2024/02/04 09:27:22 Validators: 3466
2024/02/04 09:27:22 Retrieved validators within: 2.080944525s
2024/02/04 09:27:22 Alloc = 435.43 MiB
2024/02/04 09:27:22 TotalAlloc = 1170.45 MiB
2024/02/04 09:27:22 HeapIdle = 62.38 MiB
2024/02/04 09:27:22 HeapInuse = 436.65 MiB
2024/02/04 09:27:22 Sys = 519.52 MiB

2024/02/04 09:27:32 Alloc = 39.58 MiB
2024/02/04 09:27:32 TotalAlloc = 1170.45 MiB
2024/02/04 09:27:32 HeapIdle = 458.28 MiB
2024/02/04 09:27:32 HeapInuse = 40.75 MiB
2024/02/04 09:27:32 Sys = 519.77 MiB

3466
mcdee commented 4 months ago

After some consideration, I would rather avoid using a flag that is as focused as this. As the size of the beacon chain grows it's likely that we will have additional situations where we will make time/space trade-offs, so a more general flag would make sense.

So could we change the flag to be something like "WithReducedMemoryUsage(bool)", with the parameter defaulting to false (and a note that this may result in longer response times as a result)? That gives us a level of flexibility in future without needing additional flags, or to deprecate flags that relate to implementation details as/when the beacon API evolves.

In this particular situation, if the request is made for reduced memory usage then the chunked method will be used at for all requests other than a request for all validators, and otherwise the existing logic will be used.

mcdee commented 4 months ago

@moshe-blox are you okay to alter the PR as per the above comment?

nkryuchkov commented 3 months ago

@mcdee I have opened another PR that uses the approach you described: https://github.com/attestantio/go-eth2-client/pull/121

moshe-blox commented 3 months ago

Closing in favor of #121