ethereum / hive

Ethereum end-to-end test harness
GNU General Public License v3.0
401 stars 449 forks source link

Logging backpressure #584

Open zilm13 opened 2 years ago

zilm13 commented 2 years ago

Hello I'm trying to run Hive Testsuite with high log level, DEBUG or TRACE and see huge performance degradation compared to INFO level. Not only the participation level drops from 0.97 to 0.90-0.93 but also client fails to reply to the requests from Testsuite to RPC API in 5 seconds timeout. As lower logging level works well with the same setup and logs for unresponsive container are stopped few seconds before the failure, I guess that the reason behind that is logging back pressure. Docker log output to stdout and stderr is blocking by default and when it unables to flush it in time, container stalls and resumes only after output is fetched (see docs). It's a highly repeatable issue for me with high log levels. I've tried to set docker logging to "non-blocking" mode but it doesn't help. Maybe the reason is elsewhere or I'm doing something wrong by patching this:

    c, err := b.client.CreateContainer(docker.CreateContainerOptions{
        Context: ctx,
        Config: &docker.Config{
            Image: imageName,
            Env:   vars,
        },
        HostConfig: &docker.HostConfig{
            LogConfig: docker.LogConfig{
                Config: map[string]string{
                    "mode":            "non-blocking",
                    "max-buffer-size": "4m",
                },
            },
        },
    })

Debugging tests with higher log levels looks valuable, but with blocking logging test fails early. Could you, please, help to make it working?

fjl commented 2 years ago

Hive captures client logs by attaching to the container via Docker API. I think docker does not / cannot apply this buffering to the attached connection. I will look into it.

fjl commented 2 years ago

Yes, it looks like the issue is that hive does not use the equivalent of 'docker logs' to get container logs. Instead, we attach to the container before it starts to capture all stdout/stderr.

I think we could change to grabbing the logs asynchronously for client containers using a different API.

fjl commented 1 year ago

Prototype implementation is here: https://github.com/ethereum/hive/pull/735

fjl commented 1 year ago

I cannot reproduce the original issue, my machine is too fast. I use this command to test:

./hive --sim eth2/testnet --client teku-bn,teku-vc,besu --sim.loglevel 5

The testnet participation is 0.97 and the PR makes no difference on the result.