abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
18.87k stars 382 forks source link

Critical bug affecting v0.6.9 that causes docker, compose and colima itself to stale and crash #1038

Open matteoredz opened 3 months ago

matteoredz commented 3 months ago

Description

Colima was installed through HomeBrew on my Mac with M1 chipset running the latest Sonoma 14.5 OS. I was previously running Colima v0.6.8 with success.

Due to the need of building AMD64 images, I start Colima as follows:

colima start --profile rosetta-arm --cpu 4 --memory 8 --arch aarch64 --vm-type=vz --vz-rosetta --mount-type virtiofs

and then I run my services through Docker, Docker Compose and Docker Buildx, all installed through HomeBrew:

❯ docker --version
Docker version 26.1.3, build b72abbb6f0

❯ docker-compose --version
Docker Compose version 2.27.1

❯ docker-buildx version
github.com/docker/buildx v0.14.1 Homebrew

After upgrading colima to the latest v0.6.9 I couldn't run my services anymore. Every Docker or Colima-related command became unresponsive and the pulling the services stale forever.

I noticed that, after a couple of minutes, the docker socket stopped responding, causing Colima to crash as well.

My solution was to rollback manually to v0.6.8, also the process was a bit tricky because the previous versions of Colima aren't available through the brew CLI.

Version

❯ colima version colima version 0.6.9 git commit: c3a31ed05f5fab8b2cdbae835198e8fb1717fd0f

Operating System

Output of colima status

❯ colima status rosetta-arm
INFO[0000] colima [profile=rosetta-arm] is running using macOS Virtualization.Framework
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: virtiofs
INFO[0000] socket: unix:///$HOME/.colima/rosetta-arm/docker.sock

Reproduction Steps

  1. Install Colima v0.6.9
  2. Run docker compose up -d
  3. Wait until it crashes everything

Expected behaviour

No response

Additional context

No response

abiosoft commented 3 months ago

I am unable to reproduce this yet. I have tested on both an m1 mac mini and m1 pro macbook.

Hopefully someone else would encounter same issue and we'd be able to see a pattern.

matteoredz commented 3 months ago

@abiosoft To add more context, the main Compose file is running 22 services (including ELK, Kafka, Prometheus, Grafana, etc). I had a look at the diff between 0.6.8 and 0.6.9 and the only "smell" I saw there is this commit: https://github.com/abiosoft/colima/commit/bd99ce23be4c3caa9a31d30c72816696bc8516a5

abiosoft commented 3 months ago

@matteoredz can you share the compose file? You can redact private details from it.

nathanielop commented 2 months ago

I run into this issue every so often for my docker compose build stack (and I believe I may have hit it once after initial build?), but I'm running 0.6.8 on MacOS Intel, so not sure what the relation could be. It seems to be somewhat hit or miss since the same docker compose sometimes works and sometimes doesn't. I've noticed that it's more prone to occur when I have more images loaded locally (i.e. switching between a personal docker compose to a work docker compose) and/or when the docker compose has more images/containers. Once I hit the issue all commands/image pulls/builds just hang indefinitely, i.e. attempting to colima stop through the CLI just hangs with no output. I end up having to force kill all lima/colima processes via activity monitor.

Here's my colima version output. I haven't upgraded recently and have encountered this issue somewhat recently.

colima version 0.6.8
git commit: 9b0809d0ed9ad3ff1e57c405f27324e6298ca04f

runtime: docker
arch: x86_64
client: v26.0.0
server: v24.0.7

If I hit this again I'll make sure to update with the output of colima status

abiosoft commented 2 months ago

The freezes and crashes have been narrowed down to disk-image expansion process at startup when VZ is used.

Disk image download and expansion has been reworked and the issue should now be resolved (with or without VZ being used).

Can you try the latest development version of Colima and see if the issue is now resolved.

# install development version
brew install --head colima

# remove existing profile if any
colima delete

# start colima
colima start
nathanielop commented 2 months ago

I've yet to find an exact reproduction setup for this issue, but I hit it reliably every few days. I'll try the development version and see if the problem reoccurs. Thanks.

eltoky commented 1 month ago

I had the same issue as mentioned here; and after using the --head version of colima it is all solved.

Before that, pretty easy to crash it on my MacOS M3:

docker run -it ubuntu bash
# inside the docker
apt-get update
apt-get install -y wget
wget https://cdn.downloads.dataiku.com/public/dss/12.6.2/dataiku-dss-12.6.2.tar.gz

It will just hang before finishing to download the file and crashed colima (colima no longer responsive, need to kill -9 all the processes)