containers / netavark

Container network stack
Apache License 2.0
538 stars 85 forks source link

netavark dhcp proxy, could not find lease within the timeout limit #690

Closed Aetylus closed 1 year ago

Aetylus commented 1 year ago

Issue Description

Following the instructions detailed in the Podman documentation for Basic Networking, the attempt to run a container in the network (webserver) results in the following error:

Error: netavark: unable to obtain lease: dhcp proxy error: status: Aborted, message: "Could not find a lease within the timeout limit", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sat, 06 May 2023 01:28:59 GMT", "content-length": "0"} }

Steps to reproduce the issue

Following the documentation exactly:

 $ sudo podman network create -d macvlan -o parent=eth0 webnetwork
 $ sudo systemctl enable --now netavark-dhcp-proxy.socket
 $ sudo podman run -dt --name webserver --network webnetwork quay.io/libpod/banner

Describe the results you received

I receive the error:

Error: netavark: unable to obtain lease: dhcp proxy error: status: Aborted, message: "Could not find a lease within the timeout limit", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sat, 06 May 2023 01:28:59 GMT", "content-length": "0"} }

Describe the results you expected

I expect the container to be created successfully.

podman info output

host:
  arch: amd64
  buildahVersion: 1.30.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 99.11
    systemPercent: 0.42
    userPercent: 0.46
  cpus: 1
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: server
    version: "38"
  eventLogger: journald
  hostname: beatrice
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.2.13-300.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 211496960
  memTotal: 1073811456
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.8.4-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.4
      commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-12.fc38.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 924839936
  swapTotal: 924839936
  uptime: 1h 8m 21.00s (Approximately 0.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/aetylus/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/aetylus/.local/share/containers/storage
  graphRootAllocated: 16039018496
  graphRootUsed: 8018710528
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/aetylus/.local/share/containers/storage/volumes
version:
  APIVersion: 4.5.0
  Built: 1681486942
  BuiltTime: Fri Apr 14 11:42:22 2023
  GitCommit: ""
  GoVersion: go1.20.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.5.0

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

This is running in a Hyper-V VM running Fedora Server 38, using an external network virtual switch.

Additional information

No response

Luap99 commented 1 year ago

Are you sure you have a dhcp server running on this interface?

Aetylus commented 1 year ago

Correct me if I'm misunderstanding, but doesn't netavark dhcp-proxy proxy the host's DHCP to the containers? I'm admittedly unsure exactly if this is case as the documentation here seemed unclear, and only mentions DHCP in the following context:

The next step is to ensure that the DHCP service is running. This handles the DHCP leases from the network. [...] CNI and netavark both use their own DHCP service

If it is the case, then the host does have access to DHCP on its own interface. If not, is there additional documentation detailing proper configuration?

Luap99 commented 1 year ago

The netavark dhcp proxy requests a dhcp ip from your dhcp server on the network. So yes it uses the same dhcp server as the eth0 interface but it requests a ip for a different mac (the container mac) so you server has to respond to that request.

I don't know what hyperV does but it could be very well that it ignores request for other mac addresses. Or maybe out dhcp request is confusing the server somehow? Or as the error states your server took to long to answer and we timed out. Can you check with something like tcpdump if the dhcp request is being made and if you see a response from the server?

Aetylus commented 1 year ago

Thanks, that actually clued me into the fact that the Hyper-V VM needed to enable MAC address spoofing enabled for the container to acquire its MAC address. After doing so, it does look like it was able to acquire an IP from the DHCP server.

The only other thing I'm noticing - and maybe this is outside of the scope of this issue - is that while the container is accessible from the main host (that's to say, the Hyper-V host) and other machines on the network through its IP, it's not accessible to the VM host (the Fedora Server 38 host running Podman). Is this expected behavior?

If it is, no worries there though I am curious why, as I would assume it should be accessible from both. If it isn't, I presume it might be related to some unrelated networking configuration and can investigate further.

[Edit] I found this article (https://blog.oddbit.com/post/2018-03-12-using-docker-macvlan-networks/):

With a container attached to a macvlan network, you will find that while it can contact other systems on your local network without a problem, the container will not be able to connect to your host (and your host will not be able to connect to your container). This is a limitation of macvlan interfaces: without special support from a network switch, your host is unable to send packets to its own macvlan interfaces.

While it was written in 2018 and relates to Docker, I presume this limitations are still relevant both today and for Podman.

Luap99 commented 1 year ago

Yes macvlan bypasses any routing on the host for some reason. This is how the kernel implemented it, there is nothing podman or docker can do to change that fact. We should properly make this more clear in our docs.

sarming commented 5 months ago

I think this might be a bug after all, since it is possible to reach containers from the host by attaching the host to the macvlan bridge (as opposed to the unterlying interface). See e.g. https://wiki.archlinux.org/title/systemd-networkd#MACVLAN_bridge In my case I can bring up a container using --ipam-driver=host-local and then reach the DHCP server from inside the container while --ipam-driver=dhcp fails.

I don't know if this helps, but using wireshark am seeing the DHCP request from the proxy on the underlying interface but not on the macvlan one.

Luap99 commented 5 months ago

@sarming I don't see how this is related to this issue. Please file anew one and provide exact commands/setup to reproduce and ideally do the package dumps and see where the packages are being send.