coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
262 stars 59 forks source link

Podman quadlet network unstable #1577

Open haleksandre opened 11 months ago

haleksandre commented 11 months ago

Describe the bug

The rootless containers within the quadlet network will have an unstable connection between themselves. Sometimes resolving correctly other times unable to.

Reproduction steps

  1. Start a few rootless quadlet containers connecting to each other within the same quadlet network

Expected behavior

Containers to be able to maintain connectivity between themselves within the same network.

Actual behavior

Containers aren't keeping connectivity between themselves which then causes 50x/Resource temporarily unavailable errors on requests from 1 container to another.

System details

Bare Metal Fedora CoreOS 38.20230819.3.0

Butane or Ignition config

variant: fcos
version: 1.5.0
passwd:
  users:
    - name: core
      groups:
        - sudo
        - docker
      # password_hash: # ....hash
      # ssh_authorized_keys:
        # ...keys
      home_dir: /home/core
storage:
  disks:
    - device: /dev/disk/by-id/coreos-boot-disk
      wipe_table: false
      partitions:
        - label: root
          number: 4
          # Allocate at least 8 GiB to the rootfs. See NOTE above about this.
          size_mib: 10240
          resize: true
        - label: swap
          # Allocate 16 GiB to swap
          start_mib: 0
          size_mib: 16384
          resize: true
        - label: var
          start_mib: 0
          size_mib: 0
  filesystems:
    - device: /dev/disk/by-partlabel/swap
      format: swap
      wipe_filesystem: true
      with_mount_unit: true
    - device: /dev/disk/by-partlabel/var
      path: /var
      format: xfs
      with_mount_unit: true
    # ....filesystems
  directories:
    - path: /etc/nginx/conf.d
      mode: 0755
      user:
        name: core
      group:
        name: core
      overwrite: true
    # ...directories
  files:
    - path: /etc/nginx/conf.d/default.conf
      mode: 0664
      user:
        name: core
      group:
        name: core
      contents: 
        local: # ...local content
    - path: /etc/hostname
      mode: 0644
      contents:
        inline: media
    - path: /etc/containers/systemd/users/server.network
      contents:
        inline: |
          [Network]
          Label=lan.media.Network=server.network
    - path: /etc/containers/systemd/users/nginx.volume
      contents:
        inline: |
          [Volume]
          Label=lan.media.volume=nginx
    - path: /etc/containers/systemd/users/nginx.container
      contents:
        inline: |
          [Unit]
          Description=Nginx Quadlet
          Requires=podman.socket
          After=podman.socket

          [Container]
          Image=docker.io/nginxproxy/nginx-proxy:alpine
          ContainerName=nginx
          AutoUpdate=registry
          Environment=DOCKER_HOST=unix://${XDG_RUNTIME_DIR}/podman/podman.sock
          Network=server.network
          PublishPort=80:80/tcp
          PublishPort=443:443/tcp
          Volume=${XDG_RUNTIME_DIR}/podman/podman.sock:/tmp/docker.sock:ro
          Volume=nginx.volume:/usr/share/nginx/html
          Volume=/etc/nginx/conf.d/:/etc/nginx/conf.d/
          Volume=/etc/nginx/global/:/etc/nginx/global/
          Volume=/etc/nginx/ssl/:/etc/nginx/ssl/

          [Service]
          Restart=always
          TimeoutStartSec=900

          [Install]
          WantedBy=multi-user.target
      # ...other quadlet
# systems:
#   units:
     # ...units

Additional information

For example container1 request to container2:4000/ will occasionally work & occasionally fails with a 50x error as container1 seems unable to occasionally resolve the IP of container2. This 'unable to resolve' error seems to happen randomly too which adds to the confusion.

travier commented 11 months ago

You should likely report that to podman upstream. If you could simplify the reproducer as much as possible that might help investigate.