containers / netavark

Container network stack
Apache License 2.0
537 stars 85 forks source link

Podman and Docker IPv6 Compatibility Differs #340

Open shawnweeks opened 4 years ago

shawnweeks commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Steps to reproduce the issue:

  1. Install CentOS 8 with IPv6 Enabled

  2. Install latest PodMan

  3. Attempt to start Keycloak instance

    podman run --rm -p 8080:8080 -e KEYCLOAK_USER=admin -e KEYCLOAK_PASSWORD=admin jboss/keycloak

Describe the results you received:

15:04:20,205 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC000001: Failed to start service org.wildfly.network.interface.private: org.jboss.msc.service.StartException in service org.wildfly.network.interface.private: WFLYSRV0082: failed to resolve interface private
    at org.jboss.as.server.services.net.NetworkInterfaceService.start(NetworkInterfaceService.java:96)
    at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1738)
    at org.jboss.msc.service.ServiceControllerImpl$StartTask.execute(ServiceControllerImpl.java:1700)
    at org.jboss.msc.service.ServiceControllerImpl$ControllerTask.run(ServiceControllerImpl.java:1558)
    at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
    at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1363)
    at java.lang.Thread.run(Thread.java:748)

15:04:20,234 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([("interface" => "private")]) - failure description: {"WFLYCTL0080: Failed services" => {"org.wildfly.network.interface.private" => "WFLYSRV0082: failed to resolve interface private"}}

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

[centos@cloudctl1 ~]$ podman --version
podman version 1.8.0
[centos@cloudctl1 ~]$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.12.12
  podman version: 1.8.0
host:
  BuildahVersion: 1.13.1
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module_el8.1.0+272+3e64ee36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: 7a4f0dd7b20a3d4bf9ef3e5cbfac05606b08eac0'
  Distribution:
    distribution: '"centos"'
    version: "8"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 64605507584
  MemTotal: 67204878336
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc9.module_el8.1.0+272+3e64ee36.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 33806086144
  SwapTotal: 33806086144
  arch: amd64
  cpus: 24
  eventlogger: journald
  hostname: cloudctl1.dev.example.com
  kernel: 4.18.0-147.5.1.el8_1.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.2-2.git21fdece.module_el8.1.0+272+3e64ee36.x86_64
    Version: |-
      slirp4netns version 0.4.2+dev
      commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
  uptime: 20m 14.25s
registries:
  search:
  - registry.access.redhat.com
  - registry.fedoraproject.org
  - registry.centos.org
  - docker.io
store:
  ConfigFile: /home/centos/.config/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.7.2-1.module_el8.1.0+272+3e64ee36.x86_64
      Version: |-
        fuse-overlayfs: version 0.7.2
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  GraphRoot: /home/centos/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 1
  RunRoot: /run/user/1000
  VolumePath: /home/centos/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.8.0-3.1.el8.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

This works fine on Ubuntu 18.04 with docker-io version 18.09.7, build 2d0083d and IPv6 enabled

shawnweeks commented 4 years ago

https://unix.stackexchange.com/questions/566812/keycloak-failing-to-start-with-failed-to-resolve-interface-private/

mheon commented 4 years ago

Can you provide more details on the failure here? I don't think any of our developers are particularly familiar with Tomcat, so more details on what's going wrong would make this much easier.

Is Tomcat trying to bind to an IPv6 address inside the container?

shawnweeks commented 4 years ago

It's Wildfly not Tomcat but I'm not exactly sure what the failure is. If I disable IPv6 on CentOS 8 in Grub it goes back to working. I'm trying to narrow it down now.

shawnweeks commented 4 years ago

Realized I wasn't testing against the same versions. The issue seems to be that Podman presents an IPv6 interface inside the container that might not actually allow binding and Docker does not.

This is docker based

[root@b21d2f5618f4 /]# cat /proc/net/if_inet6
[root@b21d2f5618f4 /]#

This is podman based

[root@05498918caa8 /]# cat /proc/net/if_inet6
00000000000000000000000000000001 01 80 10 80       lo
fd00000000000000200cd3fffebd9434 02 40 00 00     tap0
fe80000000000000200cd3fffebd9434 02 40 20 80     tap0
[root@05498918caa8 /]#
github-actions[bot] commented 4 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 4 years ago

@shawnweeks @mheon Did we come to a conclusion that what Podman is doing is wrong? Or just different then Docker?

shawnweeks commented 4 years ago

I'm not sure, Docker disables IPv6 by default and in Podman it's enabled and doesn't appear to work with things like WildFly App Server. I'm not sure the IPv6 is actually broken it might just be something about how WildFly tries to use it. WildFly runs fine with IPv6 if you run it directly on CentOS or RedHat.

rhatdan commented 4 years ago

Did you open this as an issue with WildFly?

shawnweeks commented 4 years ago

I've posted an a question in their forum but I suspect their going to ask why it's their issue since it works fine on Docker and bare metal. Docker has taken the approach to disable IPv6 inside of containers but bare metal with IPv6 enabled works fine.

rhatdan commented 4 years ago

@mccv1r0 WDYT?

rhatdan commented 4 years ago

@shawnweeks Could you give me a link on the wildfire issue, so I can watch it and participate.

mccv1r0 commented 4 years ago

It looks like WildFly image needs work to support IPv6.

Docker doesn't enable IPv6 unless you enable ipv6: true in /etc/docker/daemon.json. @shawnweeks are you looking for something similar re podman?

aleks-mariusz commented 4 years ago

FWIW, i've seen issues with other workloads failing to resolve after i enabled IPv6 on the host and my network, due to the container only having an IPv6 address nameserver (and IPv6 network access appears to be "not there yet" with podman).. not sure this is the case here but something to check inside the container the contents of /etc/resolv.conf.. the work-around i came up with is passing --dns 1.1.1.1 on the podman to override the host's (ipv6-only) nameserver

rhatdan commented 4 years ago

@mheon PTAL

ssbarnea commented 4 years ago

I think I am facing the same IPv6 issue, while on docker I was able to overcome it by keeping ipv6: false on its config. Still, for podman I am unable to build alpine containers because alpine apk fetch gets stuck with their own servers (due to broken IPv6 inside the containers).

Using alternative DNS servers did not work for me as remote servers can also return IpV6 (and I may also need local DNS servers for some cases too).

I may worth mentioning https://stackoverflow.com/questions/30750271/disable-ip-v6-in-docker-container/41497555#41497555 which is a workaround I tested for docker and that worked for "run" (not build).

That is a very annoying issue because IPv6 works fine on the host, but its presence is causing build failures with containers.

Still, I was able to find an ugly hack for alpine --add-host dl-cdn.alpinelinux.org:1.2.3.4 to enforce an IPv4 address for it.

salifou commented 4 years ago

I was having the same issue ... adding -e BIND=127.0.0.1 worked for me.

yangm97 commented 4 years ago

I think it's important to note the default configs (using the ubuntu package here) are a bit conflicting.

There's no IPv6 config on a "stock" /etc/cni/net.d/87-podman-bridge.conflist yet containers are created with an ipv6 link local address for me.

So in order to avoid issues like this, if podman ships with IPv6 disabled it shouldn't be creating those IPv6s, but if it is supposed to be enabled by default then it needs to come with a config which explicitly enables IPv6 addresses and internet connectivity, so disabling IPv6 becomes a matter of removing the IPv6 lines from the config, and same goes for enabling/disabling IPv4.

mheon commented 4 years ago

We've gone through this before, and came to the conclusion that there is no way of enabling IPv6 by default in the config. We'd need users to provide a routable IPv6 subnet for us to use internally, since NAT for v6 is strongly frowned upon - so we can't ship a default config that works everywhere.

At the same time, we have no easy way or disabling link-local addresses. CNI doesn't really expose knobs to tweak things at that level - we can control what CNI does create, but those are made by the kernel automatically and we don't really have a way of disabling that.

yangm97 commented 4 years ago

There's also a point to be made that many people got used to the eggshell security v4 NAT provides their containers with so globally addressable v6 containers would open up some passwordless redis instances to the internet... yeah, I feel it.

But on the other hand, as much as I hate to say it, once v4 internet comes to an end, there's no other possible default configuration which doesn't include NAT for v6, since we can't just assume a developer machine is going to receive a routable subnet and so on.

Correct me if I'm wrong, but it seems like we're all kind of postponing the inevitable.

shawnweeks commented 4 years ago

Out of curiosity how is Docker disabling the IPv6 interface inside of the containers, I thought it used similar Linux features? My base issue was that podman presented an ipv6 interface inside of the container that Wildfly couldn't bind to and it sounds like podman doesn't even enable IPv6 by default so the interface shouldn't even show up.

mheon commented 4 years ago

It's definitely a kernel feature that's available to them (because they wrote their own networking library) but not so much to us (because we're using an existing one, CNI). We could, of course, attempt to add this to CNI and contribute that change upstream, but we haven't had much luck in that area before.

dithotxgh commented 4 years ago

What about the case where there is a valid ipv6 prefix available? This would require, of course, the responsibility to include $YOUR_FAV_FIREWALL installation and configuration.

Sounds like this being precluded by policy ATM.

rhatdan commented 4 years ago

@mccv1r0 PTAL

github-actions[bot] commented 4 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

@mheon @baude another networking issue.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

@baude @mheon What should we do with this one?

mheon commented 3 years ago

Podman and IPv6 is a part of a larger discussion we need to have on our approach to networking going forward.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

aleks-mariusz commented 3 years ago

Podman and IPv6 is a part of a larger discussion we need to have on our approach to networking going forward.

bumping for our regularly scheduled three month later follow up please

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

@mheon any progress?

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

@Luap99 @mheon Does this require the network redesign?

mheon commented 3 years ago

Yes

TomSweeneyRedHat commented 3 years ago

and I'm guessing then some.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

yitingdc commented 2 years ago

Any update?

rhatdan commented 2 years ago

4.0 is supposed to ship in February. These issues will be tested against netavark to see if it has been fixed.

mheon commented 2 years ago

No way this makes it into 4.0. In v4.1 we can investigate further to see what sysctl Docker is setting to disable IPv6 (easy) and the exact circumstances where they decide to do this (hard) and then see about adding that to Netavark.

mheon commented 2 years ago

Transferring to Netavark

baude commented 1 year ago

@mheon @Luap99 what say you here ... are there still gaps to close?

mheon commented 1 year ago

Yes, we still aren't hard-disabling IPv6 when addresses aren't specified, AFAIK

mccv1r0 commented 1 year ago

Yes, we still aren't hard-disabling IPv6 when addresses aren't specified, AFAIK

What is meant by addresses aren't specified, ??

It's common for IPv6 to use SLAAC to obtain an address, especially in places where RHEL has been used historically, e.g. data centers.

mheon commented 1 year ago

We should support that, but we also need a completely-disabled mode.

For autoconfig - @Luap99 Does Podman always provide a v6 address for networks with a v6 subnet attached (and, obviously, no user-specified address)? It seems like we could just be letting autoconfig be default for such cases, which takes us out of the picture as much as possible if admins want to do something fancy. Downside would be that getting the address assigned to the container is more difficult.

Luap99 commented 1 year ago

I am pretty sure we disable autoconfig right now in netavark based on user/customer requests.

We only use addresses specified in the network config (compare podman network inspect). AFAIK docker sets the ipv6 disable sysctl in the container namespace when no ipv6 address is in the config. I don't see the value in that but clearly as this reports shows (even if it is quite old by now) some applications simply misbehave when they see ipv6. I would hope the keycloak has fixed that by now but who know how many legacy applications will fail by default. There is simple workaround, simply set the sysctl yourself: podman run --sysctl net.ipv6.conf.all.disable_ipv6=1 ...