abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
18.9k stars 382 forks source link

Containers in emulated x86_64 VM become unreachable after a while (connection refused) #962

Open daniesso opened 8 months ago

daniesso commented 8 months ago

Description

After switching from an Intel mac to an ARM mac, the x86_64 Colima VM sometimes ends up in a broken state, where no containers are reachable. Any requests made towards exposed ports receive "connection refused". It can only be resolved by some combination of recreating the VM and rebooting my mac. Restarting the containers or the VM has no effect.

Version

colima version 0.6.7 git commit: ba1be00e9aec47f2c1ffdacfb7e428e465f0b58a

runtime: docker arch: aarch64 client: v24.0.7 server: v24.0.7 limactl version 0.19.1 qemu-img version 8.2.0 Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers

Operating System

Output of colima status

colima status
INFO[0000] colima is running using macOS Virtualization.Framework 
INFO[0000] arch: aarch64                                
INFO[0000] runtime: docker                              
INFO[0000] mountType: virtiofs                          
INFO[0000] socket: unix:///Users/user/.colima/default/docker.sock
colima status x86
INFO[0000] colima [profile=x86] is running using macOS Virtualization.Framework 
INFO[0000] arch: x86_64                                 
INFO[0000] runtime: docker                              
INFO[0000] mountType: virtiofs                          
INFO[0000] socket: unix:///Users/user/.colima/x86/docker.sock 

Reproduction Steps

  1. I'm running two VMs, one aarch64 and one x86_64. These are created using colima start --cpu 4 --memory 16 --disk 100 and colima start --cpu 2 --memory 8 --disk 75 --profile x86 --arch x86_64, respectively.
  2. At first, everything works as expected. I switch between docker contexts and start and stop services in both VMs using docker compose.
  3. After some hours/days, the containers in the x86_64 VM stop responding to network requests. I can stop them and start them without issue. Their logs contain no errors. However, they cannot be reached. To test this: docker run -p 8080:80 nginx, then telnet localhost 8080 should answer "Connected to localhost", however answers "Connection refused".

Expected behaviour

Expect that healthy containers are reachable.

Additional context

I appreciate any help towards how I can debug this and provide further information.

jmjoy commented 8 months ago

I encountered similar problem:

https://github.com/apache/skywalking-php/actions/runs/7525408993/job/20481714358

docker ps is well but connection refused.

My CI will test multiple macos at the same time, but this is an occasional problem.

dtrifiro commented 8 months ago

I'm seeing something similar where the published ports of containers (docker run -p ...) are not reachable:

as @jmjoy is mentioning, this does not happen 100% of the time, but is frequent enough to consistently break CI in my usescase: I'm running 4 MacOS github actions runners and we've consistently had at least 1 failure out of the 4 runs in past month alone

edit: Here's a minimal workflow which can produce the issue on github actions. This will spawn a shell on the runner if the issue is reproduced, printing credentials you need to connect to the runner :

name: github actions testing

on:
  push:
  workflow_dispatch:

jobs:
  testing:
    runs-on: macos-latest

    steps:
      - name: install tmate
        run: |
          brew install tmate

      - name: Use colima as default docker host on MacOS
        run: |
          brew install docker
          colima start
          ls -la $HOME/.colima/default/docker.sock
          sudo ln -sf $HOME/.colima/default/docker.sock /var/run/docker.sock
          ls -la /var/run/docker.sock

      - name: run container
        run: |
          docker pull python:3.12-slim
          docker run -it -d -p 8000:8000 python:3.12-slim python -m http.server
          curl localhost:8000

      - name: run tmate
        if: failure()
        run: |
          tmate -F
daniesso commented 8 months ago

Downgrading to Colima 0.5.6 seems to have fixed this issue for me (I chose version 0.5.6 because my coworkers successfully run this version; I haven't tried any of the versions between 0.5.6 and 0.6.7, so I cannot point to a specific version where a regression in Colima may have occurred).

palmaguer commented 7 months ago

Same issue, I started to set an APEX container emulating x86_64 on my M1 MacBook Pro. Everything was running but I wanted to increase the CPU on the Colima VM, after restarting I could not access to the Database and ORDS website anymore.

~ % colima status
INFO[0000] colima is running using macOS Virtualization.Framework 
INFO[0000] arch: x86_64                                 
INFO[0000] runtime: docker                              
INFO[0000] mountType: virtiofs                          
INFO[0000] socket: unix:///Users/<username>/.colima/default/docker.sock 

I tried to restart the Colima VM even re-create it again with different specs. Noticed that Colima VM do not show any value on the ADDRESS column when execute colima ls, even if I include the --network-address.

~ % colima ls
PROFILE    STATUS     ARCH      CPUS    MEMORY    DISK     RUNTIME    ADDRESS
default      Running      x86_64   8           12GiB          60GiB    docker     

I found this issue where someone describes the following steps as workaround for this "empty" address:

# 1. Start Colima without network address flag
colima start

# 2. Get into vm
colima ssh

# 3. Disabled IPV6
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1

# 4. Start Colima as normal
colima start --cpu 8 --memory 12 --arch x86_64 --vm-type=vz --network-address
palmaguer commented 7 months ago

Testing on a non-emulated x86 VM

  1. Create a Colima VM without virtualizing x86_64.
colima start --cpu 6 --memory 12 --arch aarch64 --vm-type=vz --vz-rosetta --network-address --profile aarch64
  1. Confirm new machine.
~ % colima ls                                                                        
PROFILE    STATUS     ARCH       CPUS    MEMORY    DISK     RUNTIME    ADDRESS
aarch64    Running    aarch64    6       12GiB     60GiB    docker     192.168.106.2
x86_64     Running    x86_64     8       12GiB     60GiB    docker     
  1. Run a ngix container
docker run --rm -p 8080:80 nginx
  1. Test (200 OK)
ords % wget --no-check-certificate --spider --server-response http://localhost:8080
Spider mode enabled. Check if remote file exists.
--2024-02-12 20:23:02--  http://localhost:8080/
Resolving localhost (localhost)... 127.0.0.1, ::1
Connecting to localhost (localhost)|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: nginx/1.25.3
  Date: Tue, 13 Feb 2024 02:23:02 GMT
  Content-Type: text/html
  Content-Length: 615
  Last-Modified: Tue, 24 Oct 2023 13:46:47 GMT
  Connection: keep-alive
  ETag: "6537cac7-267"
  Accept-Ranges: bytes
Length: 615 [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.
mpstadler commented 7 months ago

Same issue here. I can confirm that disabling ipv6 and restarting the colima VM, as recommended by pablodaniel03 here helped. Thank you!

Zordid commented 7 months ago

Also, same issue here. I need to emulate x86 architecture using colima on my M3 Mac. But I cannot get the forwarded ports to be reachable from outside:

❯ docker port 376c82ad5d8d
9092/tcp -> 0.0.0.0:9092
9092/tcp -> [::]:9092
❯ nc -zv localhost 9092
nc: connectx to localhost port 9092 (tcp) failed: Connection refused
nc: connectx to localhost port 9092 (tcp) failed: Connection refused

Colima is up fine, docker ps shows everything is fine, the docker-compose logs are just fine. Only issue is: I cannot reach the ports. :(

Also running latest Colima 0.6.8

Please help...

kdescoteaux-uptycs commented 5 months ago

I'm seeing similar, and in my debugging all I have been able to determine is that the ssh process dies somewhere along the way. All the forwarded ports are then lost, i.e. nothing is listening any longer, any colima ssh session is terminated with

FATA[0650] exit status 255  

The ssh process respawns, but the port config is not restored and after this happens only an ssh based docker context works. It is the ssh process that listens to the unix docker.sock so this listener is not restored when the ssh process respawns, until colima is restarted. Broken state:

kdescoteaux@kdescoteaux-mac cloud % docker ps                     
Cannot connect to the Docker daemon at unix:///Users/kdescoteaux/.colima/default/docker.sock. Is the docker daemon running?
kdescoteaux@kdescoteaux-mac cloud % docker --context colima ps
Cannot connect to the Docker daemon at unix:///Users/kdescoteaux/.colima/default/docker.sock. Is the docker daemon running?
kdescoteaux@kdescoteaux-mac cloud % docker --context colima-ssh ps
CONTAINER ID   IMAGE                                                                                COMMAND                  CREATED             STATUS                       PORTS                                                                                                                             NAMES

"Recovery"


kdescoteaux@kdescoteaux-mac cloud % colima restart
INFO[0000] stopping colima                              
INFO[0000] stopping ...                                  context=docker
INFO[0011] stopping ...                                  context=vm
INFO[0012] done                                         
INFO[0015] starting colima                              
INFO[0015] runtime: docker                              
INFO[0031] starting ...                                  context=vm
INFO[0042] provisioning ...                              context=docker
INFO[0043] starting ...                                  context=docker
INFO[0044] done                                         
kdescoteaux@kdescoteaux-mac cloud % docker ps                     
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
kdescoteaux@kdescoteaux-mac cloud % docker --context colima ps    
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
kdescoteaux@kdescoteaux-mac cloud % docker --context colima-ssh ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES