abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
18.87k stars 382 forks source link

amd64/docker: start fails when vm is restarted after docker installation #1053

Open dimaqq opened 3 months ago

dimaqq commented 3 months ago

Description

Fails when using docker/moby runtime:

🦐/c/hexanator (main)> colima start --profile amd64 --vm-type=vz --vz-rosetta --arch amd64 --cpu 8 --memory 8 --disk 50 --network-address --mount /code:w -e
INFO[0000] editing in vim from $EDITOR environment variable
INFO[0025] starting colima [profile=amd64]
INFO[0025] runtime: docker
INFO[0026] creating and starting ...                     context=vm
INFO[0081] provisioning ...                              context=docker
INFO[0082] starting ...                                  context=docker
> [hostagent] Shutting down the host agent
> "[hostagent] failed to exit SSH master" error="failed to execute `ssh -O exit -p 49514 127.0.0.1`, out=\"Control socket connect(/Users/dima/.colima/_lima/colima-amd64/ssh.sock): No such file or directory\\r\\n\": exit status 255"
> [hostagent] Shutting down QEMU with ACPI
> "[hostagent] failed to open the QMP socket \"/Users/dima/.colima/_lima/colima-amd64/qmp.sock\", forcibly killing QEMU" error="dial unix /Users/dima/.colima/_lima/colima-amd64/qmp.sock: connect: connection refused"
> [hostagent] QEMU has already exited
> exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/dima/.colima/_lima/colima-amd64/ha.stderr.log")
FATA[0097] error starting docker: error at 'starting': exit status 1
⏎

The last line in serial console log is GRUB_FORCE_PARTUUID set, attempting initrdless boot.

(/code is my case-sensitive volume for source code that I intend to use)

At the same time:

Version

🦐/c/hexanator (main)> colima version && limactl --version && qemu-img --version
colima version 0.6.9
git commit: c3a31ed05f5fab8b2cdbae835198e8fb1717fd0f
limactl version 0.22.0
qemu-img version 9.0.1
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers

Operating System

Output of colima status

No response

Reproduction Steps

1. 2. 3.

Expected behaviour

No response

Additional context

No response

dimaqq commented 3 months ago

Let me know how I can help with this, the issue is trivially reproducible... I think I saw that the SSH port is 0 in the logs?

CarterFendley commented 3 weeks ago

@dimaqq I don't think this is your issue, but mine was that I was requesting too much memory.

Specifically I was passing --memory 8096 (copied from a previous minikube command) to colima. I think colima treats this as 8k GB of memory which very consistently would cause what looks like the same error. Changing to --memory 8 fixed the problem.

> [hostagent] Shutting down the host agent
> "[hostagent] failed to exit SSH master" error="failed to execute `ssh -O exit -p 59831 127.0.0.1`, out=\"Control socket connect(/Users/user/.colima/_lima/colima/ssh.sock): No such file or directory\\r\\n\": exit status 255"
> [hostagent] Shutting down QEMU with ACPI
> "[hostagent] failed to open the QMP socket \"/Users/user/.colima/_lima/colima/qmp.sock\", forcibly killing QEMU" error="dial unix /Users/user/.colima/_lima/colima/qmp.sock: connect: connection refused"
> [hostagent] QEMU has already exited
> exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/user/.colima/_lima/colima/ha.stderr.log")

TL;DR: Not a great error message, check if any of your parameters are messed up.