containers / dnsname

name resolution for containers
Apache License 2.0
177 stars 47 forks source link

dnsname - persistent entries in /run/containers/cni/dnsname/podman/addnhosts #47

Closed aju11 closed 3 years ago

aju11 commented 3 years ago

Hello,

I have containers running in a podman environment with the dnsname plugin installed. These containers run on the default podman CNI network. On tearing down the containers (via a podman stop && podman rm ) however, there are persistent entries present in /run/containers/cni/dnsname/podman/addnhosts . There also are residual IPs in /var/lib/cni/networks/podman

Is there a workaround for this apart from manually clearing out both?

Here are some details about the podman environment:

# podman --version
podman version 1.6.4

# podman info
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.2.0+6369+1f4293b4.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: e33ff1d39b97fdec3963b8ae6621e2a235c1ac17'
  Distribution:
    distribution: '"rhel"'
    version: "8.2"
  MemFree: 9629265920
  MemTotal: 16643911680
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc10.module+el8.2.0+6369+1f4293b4.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: overcloudva7-afxctrl-0.5a4s9.englab.juniper.net
  kernel: 4.18.0-193.14.3.el8_2.x86_64
  os: linux
  rootless: false
  uptime: 70h 54m 30.29s (Approximately 2.92 days)
registries:
  blocked: null
  insecure:
  - 192.168.213.2:9797
  - 192.168.213.2:9798
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 2
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 18
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes
mheon commented 3 years ago

This doesn't sound like a dnsname plugin issue if there are still entries in /var/lib/cni/networks/podman. It sounds like network teardown is either not being called, or not working.

Can you try a newer Podman (RHEL 8.2.1 shipped 1.9, and 8.3 shipped 2.0.5)?

aju11 commented 3 years ago

Unfortunately, there's a hard requirement to use only version 1.9 for podman. I do a podman stop && podman rm to tear down the containers. Do you think this is not triggering the network cleanup somehow? Is there any way I can debug this for the current deployment? I cannot completely teardown the default CNI network as there may be other containers (not related to my deployment) still running on that network. The /etc/cni/net.d/87-podman-bridge.conflist currently looks like this:

    "cniVersion": "0.4.0",
    "name": "podman",
    "plugins": [
    {
            "type": "bridge",
            "bridge": "cni-podman0",
            "isGateway": true,
            "ipMasq": true,
            "ipam": {
        "type": "host-local",
        "routes": [
            {
            "dst": "0.0.0.0/0"
            }
        ],
        "ranges": [
            [
            {
                "subnet": "10.88.0.0/16",
                "gateway": "10.88.0.1"
            }
            ]
        ]
            }
    },
    {
            "type": "portmap",
            "capabilities": {
        "portMappings": true
            }
    },
        {
            "type": "dnsname",
            "domainName": "dns.podman"
        }
    ]
}

Thank you for your help!

baude commented 3 years ago

offhand i cannot remember if 1.6.4 supported dnsname? @mheon do you remember?

mheon commented 3 years ago

I don't see it in the release notes, so you'd have to pull out the commit logs to figure it out

baude commented 3 years ago

@aju11 on your initial report it seems you were using podman-1.6.4. Then in a later comment you mentioned that it was a requirement to use 1.9. For the record, can you confirm that you observe this behaviour with 1.9?

rhatdan commented 3 years ago

I do not believe dnsname was supported for 1.6.4.

codemaker219 commented 3 years ago

I have a similar problem with the current podman version 2.1.1.

I have defined two networks application and middleware.

Now I start a container attached to both networks. After the deletion of the container one of the addnhosts is not cleaned up.

How can I reproduce it Create two networks. I attached my config files.

Run

podman run -d --network application,middleware  --name c1 docker.io/nginx
podman rm -f c1

After that I exepect that the addnhosts has no c1 entries anymore. Unfortunately there is:

> cat /run/containers/cni/dnsname/middleware/addnhosts

100.65.0.6  c1
fdfb:991c:79de:6500::6  c1  

This is a problem because if I restart the container c1, another container c2 on the network middleware will resolve the old ip and then there is no route to host.

The network config files are as following:

application.conflist

{
   "cniVersion": "0.4.0",
   "name": "application",
   "plugins": [
      {
         "type": "bridge",
         "bridge": "cni-podman2",
         "isGateway": true,
         "ipMasq": true,
         "ipam": {
            "type": "host-local",
            "routes": [
               {
                  "dst": "0.0.0.0/0"
               }
            ],
            "ranges": [
               [
                  {
                     "subnet": "100.65.1.0/24",
                     "gateway": "100.65.1.1"
                  }
               ],
               [
                  {
                     "subnet":"fdfb:991c:79de:6501::/64",
                     "gateway":"fdfb:991c:79de:6501::1"
                  }
               ]
            ]
         }
      },
      {
         "type": "portmap",
         "capabilities": {
            "portMappings": true
         }
      },
      {
         "type": "firewall",
         "backend": "firewalld"
      },
      {
         "type": "dnsname",
         "domainName": "dns.podman.application"
      }
   ]
}

middleware.conflist

{
   "cniVersion": "0.4.0",
   "name": "middleware",
   "plugins": [
      {
         "type": "bridge",
         "bridge": "cni-podman1",
         "isGateway": true,
         "ipMasq": true,
         "ipam": {
            "type": "host-local",
            "routes": [
               {
                  "dst": "0.0.0.0/0"
               }
            ],
            "ranges": [
               [
                  {
                     "subnet": "100.65.0.0/24",
                     "gateway": "100.65.0.1"
                  }
               ],
               [
                  {
                     "subnet":"fdfb:991c:79de:6500::/64",
                     "gateway":"fdfb:991c:79de:6500::1"
                  }
               ]
            ]
         }
      },
      {
         "type": "portmap",
         "capabilities": {
            "portMappings": true
         }
      },
      {
         "type": "firewall",
         "backend": "firewalld"
      },
      {
         "type": "dnsname",
         "domainName": "dns.podman.middleware"
      }
   ]
}
baude commented 3 years ago

@codemaker219 has your issue always been when using two networks? if that is the case, then i think we know the problem.

codemaker219 commented 3 years ago

@baude At least I found out yesterday. So "has your issue always been when using two networks"-> Always since yesterday :-)

baude commented 3 years ago

i cannot reproduce this with 3.0. what distribution are you using?

codemaker219 commented 3 years ago

Centos 8.2

baude commented 3 years ago

@codemaker219 can you elaborate? are you using the obs repos or?

codemaker219 commented 3 years ago

It should be installed via the repo mentioned in the installtion instruction. https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/CentOS_8/devel:kubic:libcontainers:stable.repo I juts saw ther ist a new Version available. But on my setup there is the version 2.1.1

mheon commented 3 years ago

My strong suspicion continues to be that this is not a dnsname issue. Can you, on a clean system (no containers running), run a single container that exits immediately, but is not removed (podman run alpine ls should do nicely) and then check the output of mount for any nsfs mounts. They will look like the following:

nsfs on /run/netns/cni-dcbab49c-bc60-7523-e99a-3323220b0e43 type nsfs (rw)

codemaker219 commented 3 years ago
root@localhost:[~]: mount | grep nsfs
root@localhost:[~]: podman run alpine ls
Trying to pull registry.fedoraproject.org/alpine...
  manifest unknown: manifest unknown
Trying to pull registry.access.redhat.com/alpine...
  name unknown: Repo not found
Trying to pull registry.centos.org/alpine...
  manifest unknown: manifest unknown
Trying to pull docker.io/library/alpine...
Getting image source signatures
Copying blob 596ba82af5aa done
Copying config 7731472c3f done
Writing manifest to image destination
Storing signatures
bin
dev
etc
home
lib
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
root@localhost:[~]: mount | grep nsfs
nsfs on /run/netns/cni-932079bb-c138-5f13-ff81-e67b4de94e1d type nsfs (rw,seclabel)
nsfs on /run/netns/cni-932079bb-c138-5f13-ff81-e67b4de94e1d type nsfs (rw,seclabel)
mheon commented 3 years ago

Network teardown is not happening. This is a Podman issue, not a dnsname issue.

rhatdan commented 3 years ago

Since this is a Podman issue and has been fixed in the main branch, I am going to close. It would be best if the version of Podman you are using does not handle this properly to open a bugzilla or distro error, to get a version of podman with the fixes.