Open jukefr opened 1 year ago
anecdotally, removing all of the restart: always
from the compose file seems to make the issue go away (or at least i was not able to reproduce, the issue does not happen 100% of the time)
podman start -a mailpit
Error: unable to start container: IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 125
podman start -a nginx
Error: unable to start container: IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 125
It doesn't seem podman-compose down
stops the instance:
rootlessp 774430 me 11u IPv6 3344113 0t0 TCP *:80 (LISTEN)
@francoism90 you seems to have many containers, there are no free ips
For example you run a container, then stop it without removing it
To confirm this type
podman ps -a
Just remove those stopped containers
@muayyad-alsadi Thanks for your help, unfortunately I'm unable to stop all containers:
podman stop $(podman ps -q)
It keeps stating the same error, when running podman-compose up
again.
To confirm no instances are running:
$ podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
The only way to get it working again, is to reboot.
@muayyad-alsadi More debug:
ERRO[0000] IPAM error: failed to get ips for container ID 09755daf17f0f85b5399e7c4bb264fdfca70c0a82a01494554e6547b6b39efa0 on network bridge
[ERROR netavark::network::bridge] failed to parse ipam options: no static ips provided
ERRO[0000] IPAM error: failed to find ip for subnet 10.89.0.0/24 on network hub_bridge
ERRO[0000] Unable to clean up network for container 09755daf17f0f85b5399e7c4bb264fdfca70c0a82a01494554e6547b6b39efa0: "tearing down network namespace configuration for container 09755daf17f0f85b5399e7c4bb264fdfca70c0a82a01494554e6547b6b39efa0: netavark: netavark encountered multiple errors:\n\t- IO error: aardvark pid not found\n\t- failed to delete container veth eth0: Netlink error: No such device (os error 19)"
Error: unable to start container 09755daf17f0f85b5399e7c4bb264fdfca70c0a82a01494554e6547b6b39efa0: IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
exit code: 12
I'm using this:
networks:
bridge:
driver: bridge
podman ps -a | wc -l
@muayyad-alsadi
$ podman ps -a | wc -l
1
It seems to happen because I'm doing podman-compose up
and podman-compose down
a lot.
I got also the issue that containers suddenly didn't start any more (while doing many podman-compose up
and podman-compose down
) and errors like above all over:
Error: unable to start container "fd9a50aff4025ada009883a6b67a3229611236191be7d0343d026dc24ab1d8e8": IPAM error: failed to find free IP in range: 10.89.0.1 - 10.89.0.254
Also deleting the problematic network didn't help, but what helped was to set it up by hand with a different IP range (not sure how long it will work, but at least for the moment it does):
podman-compose --version
['podman', '--version', '']
using podman version: 4.5.0
podman-composer version 1.0.3
podman --version
podman version 4.5.0
podman network rm reportportal_default
podman network create --driver bridge --subnet 192.168.33.0/24 --gateway 192.168.33.1 reportportal_default
@d0b3rm4n Yeah, the only workaround seems to reboot.
I don't know if any command exists to reset the network stack. It seems to be caused by containers not starting, when using a depends_on
and/or crashed due other issues.
this is a repost from https://github.com/containers/podman/issues/17069 turns out it might be compose specific
Issue Description
Trying to use traefik with podman (more specifically podman-compose) not sure what specifically is the issue or if it is even related to traefik.
Here is what the compose file looks like spoiler
```yaml version: "3.3" services: # IMPORTANT # Run commands with keep-id to make volume permissions correct and all truly rootless # podman-compose --podman-run-args="--userns=keep-id" [...] # # Forward traffic to right port with # iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 1024 # iptables -A OUTPUT -t nat -p tcp --dport 80 -j REDIRECT --to-port 1024 ########################################################################### # PROXY ########################################################################### traefik: user: "1000:1001" image: "docker.io/library/traefik" labels: - "io.containers.autoupdate=registry" restart: always command: #- "--log.level=DEBUG" - "--api.insecure=true" - "--providers.docker=true" - "--providers.docker.exposedbydefault=false" - "--entrypoints.web.address=:1024" # HTTP - "--entrypoints.ssh.address=:1025" # GIT SSH ports: - "1024:1024" - "1025:1025" - "1026:8080" volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - "/run/user/1000/podman/podman.sock:/var/run/docker.sock:ro" # NOTE # Sometimes when shutting down the rootlessport process will hang for some reason # sudo lsof -i -P -n | grep $port # sudo kill $process_number # whoami: # user: "1000:1001" # image: "docker.io/traefik/whoami" # labels: # - "io.containers.autoupdate=registry" # - "traefik.enable=true" # - "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)" # - "traefik.http.routers.whoami.entrypoints=web" # - "traefik.http.services.whoami-juke.loadbalancer.server.port=1024" # command: # - "--port=1024" # restart: always ########################################################################### # NEXTCLOUD ########################################################################### # user # password # database # cloud.localhost nextcloud_database: user: "1000:1001" image: "docker.io/library/postgres:alpine" labels: - "io.containers.autoupdate=registry" restart: always volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - ./resources/postgres_alpine_passwd:/etc/passwd:ro - ./volumes/nextcloud_database:/var/lib/postgresql/data:Z environment: - POSTGRES_DB=database - POSTGRES_USER=user - POSTGRES_PASSWORD=password nextcloud_server: user: "1000:1001" depends_on: - traefik - nextcloud_database image: "docker.io/library/nextcloud" labels: - "io.containers.autoupdate=registry" - "traefik.enable=true" - "traefik.http.routers.nextcloud_server.rule=Host(`cloud.localhost`)" - "traefik.http.routers.nextcloud_server.entrypoints=web" - "traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024" restart: always volumes: - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro - ./resources/nextcloud_server_passwd:/etc/passwd:ro - ./resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro - ./volumes/nextcloud_server:/var/www/html:Z hostname: cloud.localhost environment: - POSTGRES_PASSWORD=password - POSTGRES_DB=database - POSTGRES_USER=user - POSTGRES_HOST=nextcloud_database - NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost [...] ```Everything seems to work fine when I
However when I
I get the following error
Sometimes it will look like this however:
At this point if I lsof I see a process that I can kill
But doing so still apparently leaves the system thinking that IP addresses are allocated when they shouldnt be because trying to spin up the services again with
results in the following
Saying that its failing to find any free IPs
Steps to reproduce the issue
Steps to reproduce the issue
press CTRL-C, error happens
alteratively, start the services in daemon mode
up -d
and destroy them and their volumes in another stepdown -v
, same error happensDescribe the results you received
Stopping is not clean and not leaving hung processes and IP addresses stay unavailable. The only way I found to fix it properly is to reboot the entire host.
Describe the results you expected
Not having hung processes that make it impossible to restart the pods because no more IPs are available and needing to reboot to get it to work.
podman info output
podman information output log spoiler
```shell ~ podman version Client: Podman Engine Version: 4.3.1 API Version: 4.3.1 Go Version: go1.19.3 Git Commit: 814b7b003cc630bf6ab188274706c383f9fb9915-dirty Built: Sun Nov 20 23:32:45 2022 OS/Arch: linux/amd64 ~ podman info host: arch: amd64 buildahVersion: 1.28.0 cgroupControllers: - cpu - memory - pids cgroupManager: systemd cgroupVersion: v2 conmon: package: /usr/bin/conmon is owned by conmon 1:2.1.5-1 path: /usr/bin/conmon version: 'conmon version 2.1.5, commit: c9f7f19eb82d5b8151fc3ba7fbbccf03fdcd0325' cpuUtilization: idlePercent: 90.28 systemPercent: 1.51 userPercent: 8.2 cpus: 8 distribution: distribution: endeavouros version: unknown eventLogger: journald hostname: user-standardpcq35ich92009 idMappings: gidmap: - container_id: 0 host_id: 1001 size: 1 - container_id: 1 host_id: 100000 size: 65536 uidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 kernel: 6.1.4-arch1-1 linkmode: dynamic logDriver: journald memFree: 4336590848 memTotal: 8333340672 networkBackend: netavark ociRuntime: name: crun package: /usr/bin/crun is owned by crun 1.7.2-1 path: /usr/bin/crun version: |- crun version 1.7.2 commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4 rundir: /run/user/1000/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL os: linux remoteSocket: exists: true path: /run/user/1000/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true seccompProfilePath: /etc/containers/seccomp.json selinuxEnabled: false serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: /usr/bin/slirp4netns is owned by slirp4netns 1.2.0-1 version: |- slirp4netns version 1.2.0 commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383 libslirp: 4.7.0 SLIRP_CONFIG_VERSION_MAX: 4 libseccomp: 2.5.4 swapFree: 0 swapTotal: 0 uptime: 0h 20m 17.00s plugins: authorization: null log: - k8s-file - none - passthrough - journald network: - bridge - macvlan volume: - local registries: {} store: configFile: /home/user/.config/containers/storage.conf containerStore: number: 0 paused: 0 running: 0 stopped: 0 graphDriverName: overlay graphOptions: {} graphRoot: /home/user/.local/share/containers/storage graphRootAllocated: 31523282944 graphRootUsed: 12035514368 graphStatus: Backing Filesystem: extfs Native Overlay Diff: "true" Supports d_type: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 6 runRoot: /run/user/1000/containers volumePath: /home/user/.local/share/containers/storage/volumes version: APIVersion: 4.3.1 Built: 1668983565 BuiltTime: Sun Nov 20 23:32:45 2022 GitCommit: 814b7b003cc630bf6ab188274706c383f9fb9915-dirty GoVersion: go1.19.3 Os: linux OsArch: linux/amd64 Version: 4.3.1 ```Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Happens both locally and inside a fresh VM.
Additional information
Ask if anything unclear None of the info in here is sensitive and mostly placeholders for password and such not to worry
Speaking of the fresh VM i mention in my latest reply theres another variant of this issue i forgot to mention
no clue what causes one issue or the other (the one in the first post with the panic) to happen or why there seems to be two different possible outcomes
updated my original post to include the other error message
Okay still on the VM, after a fresh reboot again,
If I start the services with
and force a graceful shutdown with a single
CTRL-C
it seems to go through (sometimes it will error, sometimes it seems to go through, I have no clue why)However when i check with lsof
The rootlessport process seems to still be there
Okay so after trying some more things,
Running all of the commands that podman-compose runs, but manually, with a restart policy, seems to always work (as in no error)
Here are the commands i used spoiler
```bash podman run --userns=keep-id --name=juke_traefik_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=traefik -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /run/user/1000/podman/podman.sock:/var/run/docker.sock:ro --net juke_default --network-alias traefik -p 1024:1024 -p 1025:1025 -p 1026:8080 -u 1000:1001 --restart always docker.io/library/traefik --api.insecure=true --providers.docker=true --providers.docker.exposedbydefault=false --entrypoints.web.address=:1024 --entrypoints.ssh.address=:1025 &\ podman run --userns=keep-id --name=juke_nextcloud_database_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/nextcloud_database:/var/lib/postgresql/data:Z --net juke_default --network-alias nextcloud_database -u 1000:1001 --restart always docker.io/library/postgres:alpine &\ podman run --userns=keep-id --name=juke_gitea_database_1 --label io.containers.autoupdate=registry --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_database -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/postgres_alpine_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_database:/var/lib/postgresql/data:Z --net juke_default --network-alias gitea_database -u 1000:1001 --restart always docker.io/library/postgres:alpine &\ podman run --userns=keep-id --name=juke_nextcloud_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.nextcloud_server.rule="Host(\`cloud.localhost\`)" --label traefik.http.routers.nextcloud_server.entrypoints=web --label traefik.http.services.nextcloud_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=nextcloud_server -e POSTGRES_PASSWORD=password -e POSTGRES_DB=database -e POSTGRES_USER=user -e POSTGRES_HOST=nextcloud_database -e NEXTCLOUD_TRUSTED_DOMAINS=cloud.localhost -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/nextcloud_server_passwd:/etc/passwd:ro -v /home/user/juke/resources/nextcloud_server_ports.conf:/etc/apache2/ports.conf:ro -v /home/user/juke/volumes/nextcloud_server:/var/www/html:Z --net juke_default --network-alias nextcloud_server -u 1000:1001 --restart always --hostname cloud.localhost docker.io/library/nextcloud &\ podman run --userns=keep-id --name=juke_gitea_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.gitea_server.rule="Host(\`code.localhost\`)" --label traefik.http.routers.gitea_server.entrypoints=web --label traefik.http.services.gitea_server-juke.loadbalancer.server.port=1024 --label traefik.tcp.routers.gitea_server_ssh.rule="HostSNI(\`*\`)" --label traefik.tcp.routers.gitea_server_ssh.entrypoints=ssh --label traefik.tcp.services.girea_server_ssh-juke.loadbalancer.server.port=1025 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=gitea_server -e HTTP_PORT=1024 -e DEFAULT_BRANCH=main -e RUN_MODE=prod -e DISABLE_SSH=false -e START_SSH_SERVER=true -e SSH_PORT=1025 -e SSH_LISTEN_PORT=1025 -e ROOT_URL=http://code.localhost -e GITEA__database__DB_TYPE=postgres -e GITEA__database__HOST=gitea_database:5432 -e GITEA__database__NAME=database -e GITEA__database__USER=user -e GITEA__database__PASSWD=password -e GITEA__service__DISABLE_REGISTRATION=true -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/gitea_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/gitea_server:/data:Z --net juke_default --network-alias gitea_server -u 1000:1001 --restart always docker.io/gitea/gitea:latest-rootless &\ podman run --userns=keep-id --name=juke_uptime_kuma_server_1 --label io.containers.autoupdate=registry --label traefik.enable=true --label traefik.http.routers.uptime_kuma_server.rule="Host(\`status.localhost\`)" --label traefik.http.routers.uptime_kuma_server.entrypoints=web --label traefik.http.services.uptime_kuma_server-juke.loadbalancer.server.port=1024 --label io.podman.compose.config-hash=123 --label io.podman.compose.project=juke --label io.podman.compose.version=0.0.1 --label com.docker.compose.project=juke --label com.docker.compose.project.working_dir=/home/user/juke --label com.docker.compose.project.config_files=docker-compose.yml --label com.docker.compose.container-number=1 --label com.docker.compose.service=uptime_kuma_server -e PUID=1000 -e PGID=1001 -e PORT=1024 -v /etc/timezone:/etc/timezone:ro -v /usr/share/zoneinfo/Europe/Paris:/etc/localtime:ro -v /home/user/juke/resources/uptime_kuma_server_passwd:/etc/passwd:ro -v /home/user/juke/volumes/uptime_kuma_server:/app/data:Z --net juke_default --network-alias uptime_kuma_server -u 1000:1001 --restart always --entrypoint '["node", "/app/server/server.js"]' docker.io/louislam/uptime-kuma & podman stop -t 10 juke_uptime_kuma_server_1 podman stop -t 10 juke_gitea_server_1 podman stop -t 10 juke_nextcloud_server_1 podman stop -t 10 juke_gitea_database_1 podman stop -t 10 juke_nextcloud_database_1 podman stop -t 10 juke_traefik_1 podman rm juke_uptime_kuma_server_1 podman rm juke_gitea_server_1 podman rm juke_nextcloud_server_1 podman rm juke_gitea_database_1 podman rm juke_nextcloud_database_1 podman rm juke_traefik_1 ```However as soon as I do a
So this might be actually caused by podman-compose or I am very unlucky and it only happens when using the podman-compose commands (the error does not occur 100% of the time in the first place)
Should this maybe be moved to the podman-compose repo ?