Open jellyfish-bot opened 1 year ago
[ramirogm] This has attached https://jel.ly.fish/f9dc24eb-389d-4e7a-905f-b33d6e4c2b5e
So if I understand correctly, ping zookeper
above fails because ping does a reverse dns lookup which breaks if the container name is too long, even though the container name was never really used, is that correct @cywang117 @ramirogm
Instead of shortening the commit in the container name, which we cannot do right now, maybe we can tell the engine not to use the container name for DNS resolution? Not sure if that's an option
I replied to the user in that ticket above, which corresponds to https://forums.balena.io/t/352814, but I wasn't able to reproduce the DNS resolution issue when switching the base image used by zookeeper
from registry.access.redhat.com/ubi8/ubi-minimal
to balenalib/intel-nuc-ubuntu:bionic
. Perhaps this is related to the base image itself?
it's possible that is a behavior of a specific build or version of ping
ubi-minimal doesn't even ship with ping; I had to install it. I had to switch to the regular ubi
image and install it with yum install iputils
. The ubi image's ping version is ping utility, iputils-s20180629
and the other test services' ping versions are ping utility, iputils-s20161105
.
Clarification -- I switched the image from ubi-minimal
to ubi
and installed ping
, and was able to ping back and forth between all services. So it may just be a feature of ubi-minimal
.
Closing as the issue isn't caused by the Supervisor, but by a specific base image.
Reopening as I can't determine quite yet whether the issue is due to the base image.
Hi @cywang117 @pipex
I'll try to explain in more detail what we found while working on the support thread:
The issue is related to the container name being longer than 63 chars, and the fact that the docker embedded DNS resolver is used. The embedded DNS resolver logs an error with invalid DNS names that can be seen in jounrnalctl output.
I can reproduce the example using balenalib/intel-nuc-ubuntu:bionic
images.
I've enabled support on https://dashboard.balena-cloud.com/devices/29eba2b51334368a8be289dacc9c1b6b which is running:
version: "2.1"
services:
test1:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
test2:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXXX64:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXX63:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXX62:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
Once instantiated, I get the following - note the container names:
root@29eba2b:~# balena ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
585d8a54ef84 dc8eeceb4a7c "tail -f /dev/null" 19 minutes ago Up 19 minutes zookeeperXXX63_5840953_2408105_d8854ebebe2abb92c44a3670a8289067
69193ff9c140 dc8eeceb4a7c "tail -f /dev/null" 19 minutes ago Up 19 minutes test1_5840950_2408105_d8854ebebe2abb92c44a3670a8289067
ba613f04fd15 dc8eeceb4a7c "tail -f /dev/null" 19 minutes ago Up 19 minutes zookeeperXX62_5840954_2408105_d8854ebebe2abb92c44a3670a8289067
204f3c7f83d7 dc8eeceb4a7c "tail -f /dev/null" 19 minutes ago Up 19 minutes test2_5840951_2408105_d8854ebebe2abb92c44a3670a8289067
5bf3e162b263 dc8eeceb4a7c "tail -f /dev/null" 19 minutes ago Up 19 minutes zookeeperXXXX64_5840952_2408105_d8854ebebe2abb92c44a3670a8289067
a094162a5ac5 registry2.balena-cloud.com/v2/13623e9b65aa57a645205212c74f85ff "/usr/src/app/entry.…" 37 minutes ago Up 37 minutes (healthy) balena_supervisor
where:
zookeeperXX62_5840954_2408105_d8854ebebe2abb92c44a3670a8289067 62 chars
zookeeperXXX63_5840953_2408105_d8854ebebe2abb92c44a3670a8289067 63 chars
zookeeperXXXX64_5840952_2408105_d8854ebebe2abb92c44a3670a8289067 64 chars
If I ssh into the test
container and ping the three containers, when pinging the ones that have container name length === 62 or === 63 I get no error, but when pinging the one with container name length === 64 it fails:
root@69193ff9c140:/# ping zookeeperXX62
PING zookeeperXX62 (172.17.0.6) 56(84) bytes of data.
64 bytes from zookeeperXX62_5840954_2408105_d8854ebebe2abb92c44a3670a8289067.ae8c6ddc272547a49531149bd2dd187f_default (172.17.0.6): icmp_seq=1 ttl=64 time=0.175 ms
64 bytes from zookeeperXX62_5840954_2408105_d8854ebebe2abb92c44a3670a8289067.ae8c6ddc272547a49531149bd2dd187f_default (172.17.0.6): icmp_seq=2 ttl=64 time=0.117 ms
64 bytes from zookeeperXX62_5840954_2408105_d8854ebebe2abb92c44a3670a8289067.ae8c6ddc272547a49531149bd2dd187f_default (172.17.0.6): icmp_seq=3 ttl=64 time=0.115 ms
^C
--- zookeeperXX62 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2080ms
rtt min/avg/max/mdev = 0.115/0.135/0.175/0.030 ms
root@69193ff9c140:/# ping zookeeperXXX63
PING zookeeperXXX63 (172.17.0.4) 56(84) bytes of data.
64 bytes from zookeeperXXX63_5840953_2408105_d8854ebebe2abb92c44a3670a8289067.ae8c6ddc272547a49531149bd2dd187f_default (172.17.0.4): icmp_seq=1 ttl=64 time=0.167 ms
64 bytes from zookeeperXXX63_5840953_2408105_d8854ebebe2abb92c44a3670a8289067.ae8c6ddc272547a49531149bd2dd187f_default (172.17.0.4): icmp_seq=2 ttl=64 time=0.137 ms
^C
--- zookeeperXXX63 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1023ms
rtt min/avg/max/mdev = 0.137/0.152/0.167/0.015 ms
root@69193ff9c140:/# ping zookeeperXXXX64
PING zookeeperXXXX64 (172.17.0.3) 56(84) bytes of data.
^C64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.138 ms
--- zookeeperXXXX64 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.138/0.138/0.138/0.000 ms
and if you check the last entries in journalctl
, or tail
it while trying ping zookeeperXXXX64
, you see
Dec 12 22:41:13 29eba2b balenad[1343]: time="2022-12-12T22:41:13.242400385Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
A reference to this error was found by balena user hesch on https://forums.docker.com/t/reverse-dns-fails-dns-bad-rdata-if-container-name-is-larger-than-62-characters/107330
Based on that, I found on this docker guide Container name configured using --name is used to discover a container within an user-defined docker network. The embedded DNS server maintains the mapping between the container name and its IP address (on the network the container is connected to).
From there:
1) I was able to workaround the issue by renaming the container to a shorter name. See https://jel.ly.fish/f9dc24eb-389d-4e7a-905f-b33d6e4c2b5e?event=1ac9f3c9-5e17-460a-a001-3042e25df423
2) The user found that "I checked why your commit UUID is shorter than mine and discovered that commit UUIDs from git push are longer than from balena push.". That explained why the issue was hard to reproduce locally, because when I executed balena push
the container name was shorter than 64 characters:
4164398d9f1f425c4a96546f2ba89e66 #(balena push)
72b853b44d0246fd789c89269db742d04a2e5ef9 #(git push)
The difference is 8 chars
@cywang117 In your docker-compose, I think that what's happening is that if you balena push
it the container name will be shorter than 64 bytes, same as happened on my setup.
Can you do a balena ps
on your test device? If you make the container name longes, say from 'zookeeper' to 'zookeeper123456789', what happens?
Hi I'm hesch from the support thread in Balena forums. I completely support what @ramirogm wrote, it's not related to the ubi8 image, that was just a coincidence which led me to the wrong assumption at the beginning (the ubi8 containers had longer container names by coincidence).
@pipex I understand that limiting the container name to 63 chars will cut of randomly parts that you might need in other places of the code. From a short glance at the code I assume you use the commit id to distinguish different container commit versions. For that I'd suppose that also a shortened commit ID should provide safe collision avoidance. Something like that:
${serviceNamePart}_${imageIdPart}_${releaseIdPart}_${commitPrefix.substring(0, 8)}
That does not prevent the issue in all cases but makes it much less likely to appear. Just an idea that came to my mind without in depth knowledge of the supervisor code.
Changing the names the Docker/Balena DNS resolves so that it doesn't use the container name but only the alias sounds good to me but I don't know about ease of implementation and side effects.
For that I'd suppose that also a shortened commit ID should provide safe collision avoidance. Something like that:
@hs-neax unfortunately is not just collision avoidance on the supervisor code. The release uuid / commit is an identifier used when reporting information to the API, which means that we would also need to modify the API to accept the shortened commit. Removing the imageId
and releaseId
is in our roadmap, which would also solve the issue, and although that also requires API side changes, it has a bunch of other benefits that make the change more interesting.
Changing the names the Docker/Balena DNS resolves so that it doesn't use the container name but only the alias sounds good to me but I don't know about ease of implementation and side effects.
Yeah, I took a quick glance and I haven't found a way to do it via docker options, so that might be a dead end
@ramirogm From that guide, in the table under the --name
row, there's the following for aliases:
In addition to --name as described above, a container is discovered by one or more of its configured --network-alias (or --alias in docker network connect command) within the user-defined network. The embedded DNS server maintains the mapping between all of the container aliases and its IP address on a specific user-defined network. A container can have different aliases in different networks by using the --alias option in docker network connect command.
As services are created with aliases too, it sounds like this would work as well.
I modified the docker-compose above to run with Supervisor in local mode, where service names are tailed with _1_1_localrelease
or similar:
version: "2.1"
services:
test1:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
test2:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX64:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX63:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
zookeeperXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX62:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
When pinging the longest name from test2
, indeed, balena
journal logs show the error writing resolver resp, dns: bad rdata
, but I waited a bit longer and pings started going through. It looks like it takes some time:
root@nuccc:~# balena exec test2_2_1_localrelease ping zookeeperXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX64
PING zookeeperXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX64 (172.22.0.4) 56(84) bytes of data.
64 bytes from 172.22.0.4: icmp_seq=1 ttl=64 time=0.043 ms
64 bytes from 172.22.0.4: icmp_seq=2 ttl=64 time=0.136 ms
64 bytes from 172.22.0.4: icmp_seq=3 ttl=64 time=0.081 ms
Try waiting for a bit after pinging the service with the long name, and see if the behavior is the same for you.
I closed this issue before because I didn't think it was related to container names; I still think it's not, because aliases according to the docker guide are also known to the DNS resolver. Perhaps it's the order of resolution that causes the error logs in the journal -- when pinging by service name, the resolver will always first resolve by full container name, and that's where the error appears. Thoughts?
Hi @cywang117
From what we found, what fails is the reverse name lookup. In the following example, I ran ping
, nslookup
and dig
.
All three fail to perform the reverse DNS lookup, with the error showing up in journalctl
output. But they fail in different forms: ping
gives up and prints the IP address, just as if you run ping -n
. dig
and nslookup
timeout and fail since they can't really provide an answer.
You can compare a successful ping, where the response prints the hostname, with a failed one that just prints the IP address.
root@e404e104187f:/# nslookup -x 172.17.0.3
;; communications error to 127.0.0.11#53: end of file
;; communications error to 127.0.0.11#53: end of file
root@e404e104187f:/# ping zookeeperXXXX64
PING zookeeperXXXX64 (172.17.0.3) 56(84) bytes of data.
64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.097 ms
[....]
64 bytes from 172.17.0.3: icmp_seq=15 ttl=64 time=0.127 ms
^C
--- zookeeperXXXX64 ping statistics ---
15 packets transmitted, 15 received, 0% packet loss, time 43330ms
rtt min/avg/max/mdev = 0.063/0.123/0.158/0.021 ms
root@e404e104187f:/#
root@e404e104187f:/# dig -x 172.17.0.3
; <<>> DiG 9.11.3-1ubuntu1.18-Ubuntu <<>> -x 172.17.0.3
;; global options: +cmd
;; connection timed out; no servers could be reached
Dec 13 17:54:22 29eba2b balenad[1343]: time="2022-12-13T17:54:22.739330603Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
I see, thanks for clarifying @ramirogm. @pipex @kb2ma , Thinking of solutions here, could we perhaps name a service by its docker-compose service name first, and alias it with the full serviceName_imageId_releaseId_commit
, so that the service name gets used first when resolving? This also has the benefit of more easily being able to balena exec
when debugging Supervisor-managed containers.
We use the container name to figure out the current state of the app install, getting that info from the aliases might not be as easy. Also the service name avoids collisions, when moving between releases where you can have an old and a new copy of the service. For multi-app, we will definitely want to avoid collisions between similarly named services in different apps.
I took a quick look at removing imageId
and serviceId
from the container name. The imageId
could be done without server side changes, as it is really not used by the backend, the main limitation is that we return the imageId
in some supervisor API methods. We might be able to circumvent that if we get the imageId
for a given service from the target state of the app on the database, but if the database gets corrupted/deleted we won't have information to return. If the target state is a new release we won't have any information on the imageId
either.
The serviceId
cannot be changed without server side changes, we would need to add a serviceName
column in redis, and modify the UI to use the new serviceName field.
These are not small changes but it would be great to get rid of these database dependent ids and it would help with this issue here
That being said, I also like the idea of compressing release uuids. We would need to modify the target state patch to work with the shortened UUIDs, but it would save a few bytes of bandwidth per PATCH as well, which is nice
One more thing. For any changes to the container name, we also need to modify the way services are accessed via the web terminal
@pipex Thank you very much for the detailed feedback. Since there's apparently no quick fix available I tested some workarounds. Since our software stack uses kafka which heavily depends on network communication between containers this bug actually bricks our stack completely. The two production grade workarounds I came up with are:
Use Balena push
Potentially the easiest option. I'm not sure if mixing git push
and balena push
releases in one fleet is advisable but it seems to work. Downside for us is loosing the enforced connection to our git repo (with git push you can easily look up the git commit for an old release).
Use short container names and network aliases We can get around the issue by using short container names, obviously. To avoid changing the names everywhere in the code (they should be an env var but who knows) I tested network aliases. Here comes the confusing part for me. In the documentation it is stated:
networks | Only support specifying network names
However in this thread a merged PR is mentioned that implements network aliases. I tested it and seems like the following docker-compose.yml does work:
version: "2.4"
services:
test1:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
test2:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
test3:
image: balenalib/intel-nuc-ubuntu:bionic
entrypoint: ["tail", "-f", "/dev/null"]
networks:
default:
aliases:
- test3_long_name
Seems like the documentation needs to be updated here. Can someone confirm that?
[ramirogm] [ramirogm] Summary of issue discussed on a forums thread:
A container name longer than 63 chars is causing the issue. From this docker guide Container name configured using --name is used to discover a container within an user-defined docker network. The embedded DNS server maintains the mapping between the container name and its IP address (on the network the container is connected to).
A container was created with name
zookeeper_5826140_2403300_72b853b44d0246fd789c89269db742d04a2e5ef9
, which is 66 characters long. When doing a ping to that container, an error shows up on journalctl: