containers / dnsname

name resolution for containers
Apache License 2.0
177 stars 47 forks source link

Bug: massiv latency impact when using dnsname (workaround in comment 3) #55

Closed dschier-wtd closed 3 years ago

dschier-wtd commented 3 years ago

Hi,

thanks for the very cool work and effort you are putting into podman. I have identified some very weird behavior, when using podman in combination with dnsname plugin.

It seems like there is a huge performance impact (x ~150 slower response), when using podman dnsname, instead of IPs or dns servers.

Step by Step

  1. create a test environment (this case rootful)
    
    $ sudo podman network create test01
    /etc/cni/net.d/test01.conflist

$ sudo podman network inspect test01 | grep dns

            "domainName": "dns.podman",
            "type": "dnsname"

$ sudo podman container run -dt -P --name web01 --network test01 httpd

$ sudo podman container ls

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3261a7db67f6 docker.io/library/httpd:latest httpd-foreground 13 seconds ago Up 12 seconds ago 0.0.0.0:39323->80/tcp web01

2. Testing container -> host -> container communication via IP

$ sudo podman container run --rm --network test01 fedora:33 bash -c "time curl 192.168.178.106:39323"

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 45 100 45 0 0 45000 0 --:--:-- --:--:-- --:--:-- 45000

It works!

real 0m0.004s user 0m0.001s sys 0m0.002s

3. Testing container -> container communication via IP

$ sudo podman inspect web01 | grep IPAddress "IPAddress": "", "IPAddress": "10.89.0.11",

$ sudo podman container run --rm --network test01 fedora:33 bash -c "time curl 10.89.0.11" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 45 100 45 0 0 45000 0 --:--:-- --:--:-- --:--:-- 45000

It works!

real 0m0.004s user 0m0.000s sys 0m0.004s


4. testing container -> host -> container communication via dns

$ sudo podman container run fedora:33 bash -c "time curl nb01:39323" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 45 100 45 0 0 22500

It works!

0 --:--:-- --:--:-- --:--:-- 22500

real 0m0.006s user 0m0.001s sys 0m0.004s


5. testing container -> container communication via dnsname

$ sudo podman container run --rm --network test01 fedora:33 bash -c "time curl web01" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 45 100 45 0 0 54 0 --:--:-- --:--:-- --:--:-- 54

It works!

real 0m0.835s user 0m0.003s sys 0m0.003s


As you can see, steps 1 - 4 seem ok, but step 5 shows a real time x150+ slower than the other examples. This is reproducible with all kinds of traffic, as soon as dnsname name resolving is involved. Adding this up in a construct like Nextcloud, you will see huge impacts.

user -> traefik -> nextcloud-web -> nextcloud-php -> nextcloud-db


## Additonal information

$ podman --version podman version 2.2.1

$ rpm -qa | grep podman podman-2.2.1-1.fc33.x86_64 podman-docker-2.2.1-1.fc33.noarch podman-plugins-2.2.1-1.fc33.x86_64

$ rpm -qa | grep dnsmasq dnsmasq-2.83-1.fc33.x86_64



It would be awesome to get some insights here. Maybe I am doing  it wrong? Are there additional parameters needed?

Please also feel free to reach out to me for any additonal information.
dschier-wtd commented 3 years ago

Update for a redo in the same container:

$ sudo podman run -it --name client01 --network test01 fedora:33 bash
[root@534fd4a9ce91 /]# time curl web01
<html><body><h1>It works!</h1></body></html>

real    0m0.948s
user    0m0.003s
sys 0m0.011s
[root@534fd4a9ce91 /]# time curl web01
<html><body><h1>It works!</h1></body></html>

real    0m0.954s
user    0m0.007s
sys 0m0.004s
[root@534fd4a9ce91 /]# time curl web01
<html><body><h1>It works!</h1></body></html>

real    0m0.813s
user    0m0.006s
sys 0m0.005s
dschier-wtd commented 3 years ago

Additional update / workaround:

Using the internal fqdn (excluding the search domain) solves the issue:

[root@95b42f3e3572 /]# time curl web01
<html><body><h1>It works!</h1></body></html>

real    0m0.974s
user    0m0.006s
sys 0m0.006s

[root@95b42f3e3572 /]# time curl web01.dns.podman
<html><body><h1>It works!</h1></body></html>

real    0m0.008s
user    0m0.003s
sys 0m0.005s

For me, this is good enough, but maybe worth an inspection how dnsname/dnsmasq are resolving search domains/priorities these. Maybe internet resolving is tried first and times out or so. Not sure.

dschier-wtd commented 3 years ago

@baude Dunno, if this may impact the docker-compose functionality of podman3.0, but it may be worth a look. it is very common to define multiple networks in docker-compose and communicate via hostnames.

Luap99 commented 3 years ago

@daniel-wtd can you try with --dns-search dns.podman for the podman run command

dschier-wtd commented 3 years ago

Hi,

I started both of the containers with --dns-search dns.podman. Please find the results below. Looking good.

sudo podman container run --rm --network test01 --dns-search dns.podman fedora:33 bash -c "time curl web01"

real    0m0.005s
user    0m0.002s
sys     0m0.003s
sudo podman container run --rm --network test01 --dns-search dns.podman fedora:33 bash -c "time curl example.com"

real    0m0.227s
user    0m0.002s
sys     0m0.004s
Luap99 commented 3 years ago

OK I think we should add this automatically when dnsname is used. In order to do so dnsname has to add the dns search domain to the cni result and podman has to read the search domain and add it to resolv.conf.

dschier-wtd commented 3 years ago

Sounds like a plan. There may be the situation like:

networks

containers

And I am not sure, if there are limitations in the resolvers. (count of dns search entries, dns server entries)

Luap99 commented 3 years ago

Note that dnsname currently only works for one attached network, see https://github.com/containers/podman/issues/8399, https://github.com/containers/podman/issues/9492 and #12

Luap99 commented 3 years ago

https://github.com/containers/dnsname/pull/57 and https://github.com/containers/podman/pull/9501 should fix this

rhatdan commented 3 years ago

@Luap99 can we close this issue now?

Luap99 commented 3 years ago

Yes

dschier-wtd commented 3 years ago

Thanks a lot everybody :)

akash0x53 commented 2 years ago

Why this issue occurs? Any ways to reproduce this issue?