abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
19.02k stars 383 forks source link

VM cannot get network address and start k8s after OS restart #642

Open RostislavDublin opened 1 year ago

RostislavDublin commented 1 year ago

Description

Colima VM created on Mac M1 Ventura with enabled --kubernetes and --network-address options lose its network address (192.168.106.2) and cannot start k8s after the MacOS reboot. The VM deletion, then the Mac reboot, and the start of a new VM resolves the issue but only until a next Mac reboot

Version

Colima Version: 0.5.2 Lima Version: 0.15.0 Qemu Version: I don't know

Operating System

Output of colima status

~$ colima status INFO[0000] colima is running using QEMU INFO[0000] arch: aarch64 INFO[0000] runtime: docker INFO[0000] mountType: sshfs INFO[0000] address: INFO[0000] socket: unix:///Users/Rostislav_Dublin/.colima/default/docker.sock INFO[0000] kubernetes: enabled

Reproduction Steps

  1. Create a VM with --kubernetes and --network-address enabled, deploy some k8s workloads, and feel happy...

  2. Stop and start your VM if needed, and reconfigure the CPU and memory settings with no problems.

  3. Shutdown and reboot your Mac

  4. Make sure you cannot successfully get your Kubernetes back to life anymore:

    • each time you call "colima start" it now takes too long...

    • and during the VM startup you see (in a second terminal window) a blank value in the "colima list" command output ADDRESS column

      image
    • and after several minutes of waiting you see the final output message:

      image
    • and if you run "docker ps -a" you see all containers (including k8s) stopped:

      image
  5. Delete your Colima VM.

  6. Reboot your Mac (reboot is mandatory!)

  7. Start a new Colima VM

  8. Now you have your k8s again... until a next Mac restart

Expected behaviour

Your VM successfully overcomes OS restarts.

Additional context

No response

abiosoft commented 1 year ago

Seems the failed network address allocation is the cause. Does this happen quite often for you?

RostislavDublin commented 1 year ago

I have the issue 100% each time I reboot my Mac. I especially experimented multiple times, but the pattern was always the same as described above:

RostislavDublin commented 1 year ago

@abiosoft, how can I help you to get more details on this?

jaroslav-kubicek commented 1 year ago

unfortunately, exactly the same started happening to me today on Mac as well

I tried colima delete, reinstalled colima, and attempted to start again only to get stuck on the following:

$ colima start --cpu 5 --memory 10 --disk 40 --kubernetes --network-address
INFO[0000] starting colima
INFO[0000] runtime: docker+k3s
INFO[0000] preparing network ...                         context=vm
WARN[0015] error setting up network dependencies: error at 'preparing network': error running [/opt/homebrew/bin/colima daemon status default], output: "time=\"2023-03-07T17:31:48+01:00\" level=fatal msg=\"pid file not found: stat /Users/jaroslav.kubicek/.colima/default/daemon/daemon.pid: no such file or directory\"", err: "exit status 1"  context=vm
INFO[0015] creating and starting ...                     context=vm
WARN[0015] error setting up reachable IP address: vmnet socket file not found: stat /Users/jaroslav.kubicek/.colima/default/daemon/vmnet.sock: no such file or directory
> [hostagent] Waiting for the essential requirement 1 of 5: "ssh"

EDIT: this error got solved by restarting, but I'm still getting the same error as described here in the issue:

FATA[0093] error starting kubernetes: error running [lima kubectl cluster-info], output: "The connection to the server localhost:8080 was refused - did you specify the right host or port?", err: "exit status 1"
RostislavDublin commented 1 year ago

I temporarily uninstalled Colima and returned to Docker Desktop. So pity. I really liked Colima's approach and would like to continue with it. Pls, ping me when the issue is fixed. Thank you for your gr8 efforts!

abiosoft commented 1 year ago

The network address issue mainly surfaced in macOS Ventura, it was more stable in older macOS versions.

Considering there have been reports of better experience with bridged network, the ability to toggle between bridged and shared is being worked on.

The preference is still shared network and we will keep troubleshooting to find the root cause of the erratic behaviour.

abiosoft commented 1 year ago

@RostislavDublin can you try the latest development version brew install --head colima and see if the issue still persists?

Thanks.

speedupmate commented 1 year ago

this issue still exists after brew install --head colima

emanuil-tolev commented 1 year ago

FYI if you have colima https://github.com/abiosoft/colima/commit/20ba980d963a36cb71c5844c80caf6bcee13d7cd or later (v.0.5.5 will suffice, or reinstall using --head as suggested above) then you have a workaround for this: assign a static IP via the COLIMA_IP env var at the very end of your colima.yaml file.

env:
  COLIMA_IP: 192.168.106.10
  # and any other env vars you need, if any

Thanks for adding this option while the original problem can be trouble-shot!

sastorsl commented 8 months ago

I haven't run all the scenarios, but could this be related to not getting a static IP on reboot? I seem at least to get this when moving my Mac from the office to the home office, but testing it full on requires some time.

A workaround where one can edit / add network on a configured machine would at least help out, won't need to download all images again.

norrs commented 7 months ago

See if this might be the cause: https://github.com/abiosoft/colima/issues/458#issuecomment-1989839779