abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
19.78k stars 397 forks source link

Error starting with --network-address #458

Open liaden opened 2 years ago

liaden commented 2 years ago

Description

Attempting to start --network-address fails. Additional starts without the flag fail with a different error message. Removing ~/.colima allows for a successful start, without the network address.

Version

Colima Version: 0.4.6 Lima Version: 0.12.0, 0.13.0 Qemu Version: 7.1.0

Operating System

Reproduction Steps

  1. colima start --network-address

Expected behaviour

Colima starts correctly.

Additional context

First start:

INFO[0000] starting colima
INFO[0000] runtime: docker
INFO[0000] preparing network ...                         context=vm
WARN[0015] error starting network: error at 'preparing network': error running [/opt/homebrew/bin/colima daemon status default], output: "time=\"2022-10-27T09:57:15-04:00\" level=fatal msg=\"pid file not found: stat /Users/joel.johnson/.colima/default/daemon/daemon.pid: no such file or directory\"", err: "exit status 1"  context=vm
INFO[0015] creating and starting ...                     context=vm
WARN[0015] error setting up routable IP address: vmnet ptp socket file not found: stat /Users/joel.johnson/.colima/default/daemon/vmnet.ptp: no such file or directory
> Using cache "/Users/joel.johnson/Library/Caches/lima/download/by-url-sha256/e9bac04e9bdb31be4d3de1506d97eb60d59d9ad1a2d97f2b21f760e06f3e4408/data"
> [hostagent] Starting QEMU (hint: to watch the boot progress, see "/Users/joel.johnson/.lima/colima/serial.log")
> SSH Local Port: 57347
> [hostagent] Waiting for the essential requirement 1 of 5: "ssh"
> [hostagent] QEMU has exited
> exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/joel.johnson/.lima/colima/ha.stderr.log")
FATA[0017] error starting vm: error at 'creating and starting': exit status 1

ha.stderr.log:

{"level":"debug","msg":"Creating iso file /Users/joel.johnson/.lima/colima/cidata.iso","time":"2022-10-27T09:57:15-04:00"}
{"level":"debug","msg":"Using /var/folders/1n/fvnyc01554s0240g36s382jh6nvfqs/T/diskfs_iso413631167 as workspace","time":"2022-10-27T09:57:15-04:00"}
{"level":"debug","msg":"firmware candidates = [/Users/joel.johnson/.local/share/qemu/edk2-aarch64-code.fd /Users/joel.johnson/.colima/_wrapper/3a9197e1ca3cd2da076da2b473d7a7eb118e2cca/share/qemu/edk2-aarch64-code.fd /usr/share/AAVMF/AAVMF_CODE.fd /usr/share/qemu-efi-aarch64/QEMU_EFI.fd]","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"OpenSSH version 8.6.1 detected","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"AES accelerator seems available, prioritizing aes128-gcm@openssh.com and aes256-gcm@openssh.com","time":"2022-10-27T09:57:17-04:00"}
{"level":"info","msg":"Starting QEMU (hint: to watch the boot progress, see \"/Users/joel.johnson/.lima/colima/serial.log\")","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"qCmd.Args: [/Users/joel.johnson/.colima/_wrapper/3a9197e1ca3cd2da076da2b473d7a7eb118e2cca/bin/qemu-system-aarch64 -m 2048 -cpu host -machine virt,accel=hvf,highmem=off -smp 2,sockets=1,cores=2,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Users/joel.johnson/.colima/_wrapper/3a9197e1ca3cd2da076da2b473d7a7eb118e2cca/share/qemu/edk2-aarch64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/joel.johnson/.lima/colima/basedisk,media=cdrom,readonly=on -drive file=/Users/joel.johnson/.lima/colima/diffdisk,if=virtio,discard=on -cdrom /Users/joel.johnson/.lima/colima/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:57347-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:56:07:0a -device virtio-rng-pci -display none -vga none -device ramfb -device qemu-xhci,id=usb-bus -device usb-kbd,bus=usb-bus.0 -device usb-mouse,bus=usb-bus.0 -parallel none -chardev socket,id=char-serial,path=/Users/joel.johnson/.lima/colima/serial.sock,server=on,wait=off,logfile=/Users/joel.johnson/.lima/colima/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/joel.johnson/.lima/colima/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-colima -pidfile /Users/joel.johnson/.lima/colima/qemu.pid]","time":"2022-10-27T09:57:17-04:00"}
{"level":"info","msg":"Waiting for the essential requirement 1 of 5: \"ssh\"","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"qemu[stderr]: time=\"2022-10-27T09:57:17-04:00\" level=fatal msg=\"dial unix /Users/joel.johnson/.colima/default/daemon/gvproxy.sock: connect: no such file or directory\"","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"executing script \"ssh\"","time":"2022-10-27T09:57:17-04:00"}
{"error":"exit status 1","level":"info","msg":"QEMU has exited","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"executing ssh for script \"ssh\": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile=\"/Users/joel.johnson/.lima/_config/user\" -o IdentityFile=\"/Users/joel.johnson/.ssh/id_ed25519\" -o IdentityFile=\"/Users/joel.johnson/.ssh/id_rsa\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^aes128-gcm@openssh.com,aes256-gcm@openssh.com\" -o User=lima -o ControlMaster=auto -o ControlPath=\"/Users/joel.johnson/.lima/colima/ssh.sock\" -o ControlPersist=5m -p 57347 127.0.0.1 -- /bin/bash]","time":"2022-10-27T09:57:17-04:00"}
{"level":"debug","msg":"stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 57347: Connection refused\\r\\n\", err=failed to execute script \"ssh\": stdout=\"\", stderr=\"ssh: connect to host 127.0.0.1 port 57347: Connection refused\\r\\n\": exit status 255","time":"2022-10-27T09:57:17-04:00"}

Second start:

INFO[0000] starting colima
INFO[0000] runtime: docker
INFO[0000] preparing network ...                         context=vm
WARN[0015] error starting network: error at 'preparing network': error running [/opt/homebrew/bin/colima daemon status default], output: "time=\"2022-10-27T10:01:35-04:00\" level=fatal msg=\"pid file not found: stat /Users/joel.johnson/.colima/default/daemon/daemon.pid: no such file or directory\"", err: "exit status 1"  context=vm
WARN[0015] error setting up routable IP address: vmnet ptp socket file not found: stat /Users/joel.johnson/.colima/default/daemon/vmnet.ptp: no such file or directory
INFO[0015] starting ...                                  context=vm
> Using the existing instance "colima"
> [hostagent] Starting QEMU (hint: to watch the boot progress, see "/Users/joel.johnson/.lima/colima/serial.log")
> SSH Local Port: 57370
> [hostagent] QEMU has exited
> exiting, status={Running:false Degraded:false Exiting:true Errors:[] SSHLocalPort:0} (hint: see "/Users/joel.johnson/.lima/colima/ha.stderr.log")
FATA[0017] error starting vm: error at 'starting': exit status 1

which is obviously due to the colima.yaml being updated with the --network-address flag.

madpah commented 10 months ago

I've had to revise my solution to run this command inside the colima VM (after sudo):

ip address add 192.168.106.2/24 broadcast + dev col0

norrs commented 8 months ago

So there seems to be multiple issues in the project regarding regarding creating a colima VM and successfully obtain a IPv4 address after --network-address is enabled during creation.

@norrs doing some research:

After reading source code for colima, and trying a million things mentions in all various issues.. I find hint from the comment here in this issue https://github.com/abiosoft/colima/issues/458#issuecomment-1741879608 which lead me on path to research if VM actaully got an IPv4 address from the dhcp server and if there was possible firewall issues. This lead me to the great issue over at lima-vm at https://github.com/lima-vm/lima/issues/1259 and further into https://github.com/lima-vm/socket_vmnet/issues/18#issuecomment-1574149506 which seems to confirm my suspect.

Screenshot 2024-03-12 at 02 18 06

You really want a firewall and not disable it, and bootpd should not really be blocked according to the description here, so this sounds like a Apple bug as mentioned in the socket_vmnet issue.

Executing the following commands below would add a firewall rule for bootpd and allow the VM to obtain an IPv4 address over DHCP.

$ sudo /usr/libexec/ApplicationFirewall/socketfilterfw --add /usr/libexec/bootpd
$ sudo /usr/libexec/ApplicationFirewall/socketfilterfw --unblock /usr/libexec/bootpd

However, you might be blocked due to MDM (mobile device management) in your company if you are unlucky: https://github.com/lima-vm/socket_vmnet/issues/18#issuecomment-1717216573 and https://github.com/lima-vm/socket_vmnet/issues/18#issuecomment-1727195929 .

This issue seems to already be documented in lima-vm/socket_vmnet:README.

FYI @abiosoft - hope it helps clearing up errors others are experiencing.

For everyone else: This is also a good link for introduction on how lima under the hood does networking: https://lima-vm.io/docs/config/network/

And my background story and 😴 researching:

I had this working on Friday, I believe I did a required system upgrade due to policy on the computer during the weekend (and probably had colima running in the background) and ended up with Sonoma 14.4 on my newly freshly aquired Mac Pro M3. After reboot I noticed my VMs under colima did not receive an IPv4 addresses.

Silvanoc commented 7 months ago

It is sort of weird that I'm observing something completely different from what @norrs is reporting in this comment. I wonder if we're observing completely different issues or the same issue with different manifestations due to slight differences in our set-ups.

These are the observed differences.

Everything works and it obtains an IPv4 address inside the VM on interface col0 that is reachable from the host when using virtual machine type vz ( $ colima start --vm-type vz --network-address ).

In my case for VZ fails because the interface col0 is not even being created...

It sadly fails for qemu ($ colima start --arch=x86_64 --network-address)

In my case for QEMU it works perfectly.

My setup:


To see how to reproduce the issue, click here ### VZ #### Command line `colima --profile test-net delete -f ; sleep 5 ; colima start --profile test-net --network-address --arch x86_64 --vm-type vz && colima --profile test-net ssh ip link ; colima --profile test-net status` #### Output ``` shell INFO[0000] deleting colima [profile=test-net] INFO[0000] done INFO[0000] starting colima [profile=test-net] INFO[0000] runtime: docker INFO[0000] creating and starting ... context=vm INFO[0102] provisioning ... context=docker INFO[0104] starting ... context=docker INFO[0215] done 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:55:55:ce:b3:5a brd ff:ff:ff:ff:ff:ff 3: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:9b:d1:eb:ee brd ff:ff:ff:ff:ff:ff INFO[0000] colima [profile=test-net] is running using macOS Virtualization.Framework INFO[0000] arch: x86_64 INFO[0000] runtime: docker INFO[0000] mountType: virtiofs INFO[0000] socket: unix:///Users/[...]/.colima/test-net/docker.sock ``` ### QEMU #### Command line `colima --profile test-net delete -f ; sleep 5 ; colima start --profile test-net --network-address --arch x86_64 --vm-type qemu && colima --profile test-net ssh ip link ; colima --profile test-net status` #### Output ``` shell INFO[0000] deleting colima [profile=test-net] INFO[0000] deleting ... context=docker INFO[0001] done INFO[0000] starting colima [profile=test-net] INFO[0000] runtime: docker INFO[0001] creating and starting ... context=vm INFO[0100] provisioning ... context=docker INFO[0102] starting ... context=docker INFO[0209] done 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:55:55:ce:b3:5a brd ff:ff:ff:ff:ff:ff 3: col0: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:55:55:95:bc:04 brd ff:ff:ff:ff:ff:ff 4: docker0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:2d:62:04:ef brd ff:ff:ff:ff:ff:ff INFO[0000] colima [profile=test-net] is running using QEMU INFO[0000] arch: x86_64 INFO[0000] runtime: docker INFO[0000] mountType: virtiofs INFO[0001] address: 192.168.106.4 INFO[0001] socket: unix:///Users/[...]/.colima/test-net/docker.sock ```
sudeeshjohn commented 6 months ago

It is sort of weird that I'm observing something completely different from what @norrs is reporting in this comment. I wonder if we're observing completely different issues or the same issue with different manifestations due to slight differences in our set-ups.

same here. IP address is not showing up for colima start --network-address --arch x86_64 --vm-type vz

greycr0w commented 5 months ago

M1 Sonoma 14.5 On my side, this was introduced when I stopped colima to test something out.

I was able to fix it with the following steps:

  1. Uninstall colima
  2. Install --HEAD version: brew install --HEAD colima
  3. Restart docker daemon
  4. Execute with --vm-type vz
Silvanoc commented 5 months ago

After having reported here that colima start --arch=x86_64 --network-address was working fine for me, now it has stopped working 🀦

Silvanoc commented 5 months ago

@abiosoft I have just tried brew install --head colima as you propose in a quite old comment. So I'm using now c3a31ed and I am still having difficulties to run colima start --arch=x86_64 --network-address on an M1 MacBook.

The reported error message is error calling fd_connect: fd_connect: dial unix /Users/[...]/.colima/test-net/daemon/vmnet.sock: connect: connection refused.

kaiserbrito commented 2 months ago

@abiosoft I have just tried brew install --head colima as you propose in a quite old comment. So I'm using now c3a31ed and I am still having difficulties to run colima start --arch=x86_64 --network-address on an M1 MacBook.

The reported error message is error calling fd_connect: fd_connect: dial unix /Users/[...]/.colima/test-net/daemon/vmnet.sock: connect: connection refused.

I was having the same issue, I followed the steps @abiosoft mentioned before:

brew uninstall colima qemu lima docker docker-compose
brew autoremove
brew cleanup
rm -rf ~/.docker ~/.colima
brew install colima qemu lima docker docker-compose
davidreghay commented 2 months ago

Seems like this is quite a persistent issue. Yesterday I was trying to follow @madpah's suggested workaround as given above here but on newer versions of lima running newer a newer release of Ubuntu the solution doesn't work since Ubuntu switched over to netplan.io.

I was able to update the col0 device address on the colima instance by adapting those instructions as follows (diverges at step 3):

  1. Start your colima VM with whatever command you are using (ensuring you include --network-address option

(Wait for it to start) I confirm it's started with no network address by running:

colima status
INFO[0000] colima is running using macOS Virtualization.Framework
INFO[0000] arch: aarch64
INFO[0000] runtime: docker
INFO[0000] mountType: virtiofs
INFO[0000] address:
INFO[0000] socket: unix:///Users/username/.colima/default/docker.sock
  1. On my Mac, there should now be a network bridge adapter created - mine is always bridge100 - grab it's IP address by running:

    ifconfig bridge100 | grep "inet "
    inet 192.168.106.1 netmask 0xffffff00 broadcast 192.168.106.255
    colima ssh
  2. (you are now in SSH inside VM)

    sudo su -

Edit the file /etc/netplan/50-cloud-init.yaml so that it looks similar to the below, using the IP address returned in step 2 as the value for gateway (don't alter the eth0 settings or anything for col0 besides the addresses field):

# This file is generated from information provided by the datasource.  Changes
# to it will not persist across an instance reboot.  To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        col0:
            addresses:
                - 192.168.64.1/24
            match:
                macaddress: 52:55:55:54:9b:81
            set-name: col0
        eth0:
            dhcp4: true
            match:
                macaddress: 52:55:55:a5:45:37
            nameservers:
                addresses:
                - 192.168.5.2
            set-name: eth0
    version: 2
  1. Run netplan apply

This worked for me but of course is a stopgap solution rather than longterm one.

@jamhed and @norrs your notes make sense but I'm not clear on how to apply them. Notably though, the manual fix as outlined above does not require having permission to change any firewall or networking rules on the host machine. Maybe the best thing would be supplying a configuration that allows starting it up with assigning col0 a static address instead of trying to use DHCP?

jubr commented 1 month ago

Lemme add my $0.02 as I am also a long-time sufferrer from --network-address-less-ness.
I finally decided to take the plunge, un-freeze my 0.5.6 & give it a go!
On M1 / Sonoma 14.7 / colima 0.7.5 / lima 0.23.2 / qemu 9.1.0 here. Applied socketfilterfw steps above.

I was watching tail -F /Users/jbraam/.colima/x86/daemon/daemon.log when I noticed:

time="2024-10-22T02:31:09+02:00" level=trace msg="cmd int [\"sudo\" \"/opt/colima/bin/socket_vmnet\" \"--vmnet-mode\" \"shared\" \"--socket-group\" \"staff\" \"--vmnet-gateway\" \"192.168.106.1\" \"--vmnet-dhcp-end\" \"192.168.106.254\" \"--pidfile\" \"/opt/colima/run/vmnet-x86.pid\" \"/Users/jbraam/.colima/x86/daemon/vmnet.sock\"]"
sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
sudo: a password is required
time="2024-10-22T02:31:09+02:00" level=error msg="error starting vmnet: error running vmnet: exit status 1"

So it might look like the sudoers might be not working?

$ cat /etc/sudoers.d/colima
# starting vmnet daemon
%staff ALL=(root:wheel) NOPASSWD:NOSETENV: /opt/colima/bin/socket_vmnet --vmnet-mode shared --socket-group staff --vmnet-gateway 192.168.106.1 --vmnet-dhcp-end 192.168.106.254 *
# terminating vmnet daemon
%staff ALL=(root:wheel) NOPASSWD:NOSETENV: /usr/bin/pkill -F /opt/colima/run/*.pid
# validating vmnet daemon
%staff ALL=(root:wheel) NOPASSWD:NOSETENV: /usr/bin/pkill -0 -F /opt/colima/run/*.pid

Perhaps this is because I am on a corp laptop.

Anyhoozers, I thought to try:

sudo /opt/homebrew/bin/colima daemon start x86 --vmnet --very-verbose

and then:

colima start -p x86 --very-verbose

which gave all the "regular" info, and finally:

$ colima ls
PROFILE    STATUS     ARCH       CPUS    MEMORY    DISK     RUNTIME    ADDRESS
arm        Stopped    aarch64    3       6GiB      60GiB
x86        Running    x86_64     2       3GiB      60GiB    docker     192.168.106.2

πŸŽ‰ Oh happy day, and there was much rejoicing!

I hope this might take this by now epic saga one step further 😜