TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.6k stars 1.94k forks source link

Docker 'Bridge' networks degrade whole-system network throughput and may bottleneck benchmarks #6486

Open errantmind opened 3 years ago

errantmind commented 3 years ago

OS (Please include kernel version)

Linux -- Ubuntu 20.04.2 LTS -- 5.11.7-051107-generic

Expected Behavior

Docker network configuration does not significantly degrade / bottleneck network throughput

Actual Behavior

Docker reduces framework throughput by ~20% just by being installed, even when not in use, and may also be bottlenecking the benchmark

Steps to reproduce behavior

  1. First, benchmark desired framework with wrk and without docker installed. Note throughput of framework in benchmark, as well as latency
  2. Apt install docker-ce (which also installs docker-ce-cli and containerd.io)
  3. Benchmark framework as before, not even using docker but with it installed, note reduction in performance
  4. (optional) Run framework inside docker container of choice, I used bullseye-slim. Run wrk on host against framework running in the container and note performance is roughly the same as the previous step. Also note this does not change much when specifying --network host
  5. Stop the docker service, then uninstall docker. Restart the system (was necessary for me) then run the benchmark, notice performance is restored

Other details and logs

In my tests, if you run docker network ls and see any Bridge networks, including the default one, system-wide performance is degraded. The default bridge network cannot be removed by any normal means (i.e. docker network rm). If these steps are followed, the default Bridge network can be removed. Run the benchmark after following the steps and notice performance is almost restored to 'non-docker' levels. I still saw a 5% throughput degradation after removing this network but it was much better than otherwise. Note, TechEmpower installs a network called tfb which creates a Bridge network so I am fairly confident this is an issue worth discussing.

The basic reasoning I could find for the reduction in performance is Docker's default network configuration: it includes a Bridge network which enables iptables, which can slow down the whole system, even when docker is not in use. There are other network configurations which supposedly do not suffer from this issue, like using macvlan or ipvlan, although it may be good enough to just use --network host without any bridge networks in existence.

NateBrady23 commented 3 years ago

@errantmind This is definitely something worth digging into deeper. @msmith-techempower and I looked at this a while back and we did not find any performance degradation, though it's been some time / updates since and certainly things may have changed. At least, if this is the case, all frameworks should be affected the same.

We may not have time to take a look at this in the next couple of weeks, but feel free to drop more info here if you have it. Benchmarking logs with and without the default bridge would be helpful if you have them. Also curious if you were doing this on a single machine or using a mutli machine set up like we do on our Citrine environment.

Thanks for the report!

errantmind commented 3 years ago

I'm doing this on a single machine so that could be a factor. I have tried multiple frameworks, each which experience the degradation in network throughput (req/s) of about 20%, so you may be right in saying all frameworks should be affected the same. However, I think it is worth looking into at some point because of how it might be affecting the top-end frameworks, which are already very close to each other in performance. Without this overhead (if it exists in your multi-machine environment) it may be possible for them to further differentiate. I'm working on a framework myself and am short on time, but after I get it submitted I'll try to submit some detailed logs

Kogia-sima commented 3 years ago

Disabling userland proxy may alleviate this overhead.

https://franckpachot.medium.com/high-cpu-usage-in-docker-proxy-with-chatty-database-application-disable-userland-proxy-415ffa064955

billywhizz commented 3 years ago

Disabling userland proxy may alleviate this overhead.

https://franckpachot.medium.com/high-cpu-usage-in-docker-proxy-with-chatty-database-application-disable-userland-proxy-415ffa064955

yes. try setting '"userland-proxy": false' in your daemon.json (usually at /etc/docker/daemon.json) and restarting docker. the overhead should be nowhere near 20% with this disabled.

sebastienros commented 2 years ago

At MS we run all the TE benchmarks with --network host for the same reasons.