cirruslabs / tart

macOS and Linux VMs on Apple Silicon to use in CI and other automations
https://tart.run
Other
3.76k stars 106 forks source link

monterey-xcode:latest does not allow ssh login #668

Closed umlaeute closed 9 months ago

umlaeute commented 9 months ago

following the docs at https://tart.run/integrations/gitlab-runner/ i use a gitlab-ci.yml file like that:

test:
  image: ghcr.io/cirruslabs/macos-monterey-xcode:latest
  tags:
    - tart-installed
  script:
    - uname -a

and i get the following output pm gitlab:

Running with gitlab-runner 16.6.0 (3046fee8)
  on mymac 4g4jdzfx, system ID: s_0b22b5a5ed7e
section_start:1700755128:prepare_executor
Preparing the "custom" executor
Using Custom executor...
2023/11/23 16:58:48 Pulling the latest version of ghcr.io/cirruslabs/macos-monterey-xcode:latest...
2023/11/23 17:15:10 Cloning and configuring a new VM...
2023/11/23 17:15:10 Waiting for the VM to boot and be SSH-able...
2023/11/23 17:28:32 VM errored: failed to connect via SSH: All attempts fail:
cirruslabs/macos-image-templates#1: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#2: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#3: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#4: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#5: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#6: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#7: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#8: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#9: dial tcp 192.168.65.2:22: connect: operation timed out
cirruslabs/macos-image-templates#10: dial tcp 192.168.65.2:22: connect: operation timed out
section_end:1700756912:prepare_executor
ERROR: Job failed: exit status 1


if i replace the image with ghcr.io/cirruslabs/macos-sonoma-base:latest or ghcr.io/cirruslabs/macos-sonoma-xcode:latest everything works as expected (therefore i concluded that this is an issue with the image itself, rather than the gitlab-tart-executor)

umlaeute commented 9 months ago

For what it is worth, I've also tried ghcr.io/cirruslabs/macos-monterey-base:latest which seems to work fine as well, so it probably is just an issue with the montery-xcode image, or some temporary glitch on my tart-host (although I did try several times, before coming here)

fkorotkov commented 9 months ago

What is that macOS version on the host? There might be compatibility issues if it's prior to Ventura 13.3. Have you also tried manually run tart clone/tart run/tart ip on the host?

FYI we haven't updated the Monterey images in a while and not planning to.

umlaeute commented 9 months ago

i don't know the exact macOS version on the host (as i currently do not have physical access to it), but I've upgraded to Ventura a few days ago, so I assume it is the latest and greatest Ventura available (as of today).

however, I can run this remotely:

$ sw_vers -productVersion
13.6.1

I did manually clone and start the VM, and haven't been able to ssh into (or even ping) the IP returned by tart ip, even though the host had a bridge address in the same network...

edigaryev commented 9 months ago

@umlaeute can you check if you're experiencing the symptoms similar to https://github.com/cirruslabs/tart/issues/657?

Which IP address your VM is getting (if you look from inside of the guest)?

umlaeute commented 9 months ago

i've re-run my tests and i got:

also, with today's tests this is no longer only an issue with monterey-xcode image, but with all tested images (monterey-base, monterey-xcode, sonoma-base, sonoma-xcode). so this is obviously not an issue with one image, but with the network setup of the host.

indeed, the host has two bridge interfaces (for $reasons):

$ ifconfig
[...]
bridge100: flags=8a63<UP,BROADCAST,SMART,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
    options=3<RXCSUM,TXCSUM>
    ether 9e:76:0e:a3:47:64 
    inet 192.168.66.1 netmask 0xffffff00 broadcast 192.168.66.255
    inet6 fe80::9c76:eff:fea3:4764%bridge100 prefixlen 64 scopeid 0x1c 
    inet6 fdbd:ab55:9e51:a104:14a4:c78c:623:634c prefixlen 64 autoconf secured 
    Configuration:
        id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
        maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
        root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
        ipfilter disabled flags 0x0
    member: vmenet0 flags=3<LEARNING,DISCOVER>
            ifmaxaddr 0 port 27 priority 0 path cost 0
    nd6 options=201<PERFORMNUD,DAD>
    media: autoselect
    status: active
[...]
bridge101: flags=8a63<UP,BROADCAST,SMART,RUNNING,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
    options=63<RXCSUM,TXCSUM,TSO4,TSO6>
    ether 9e:76:0e:a3:47:65 
    inet 192.168.67.1 netmask 0xffffff00 broadcast 192.168.67.255
    inet6 fe80::9c76:eff:fea3:4765%bridge101 prefixlen 64 scopeid 0x21 
    inet6 fd33:8aad:3604:65e5:4da:bcba:1b09:4f63 prefixlen 64 autoconf secured 
    Configuration:
        id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
        maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
        root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
        ipfilter disabled flags 0x0
    member: vmenet1 flags=10803<LEARNING,DISCOVER,PRIVATE,CSUM>
            ifmaxaddr 0 port 32 priority 0 path cost 0
    nd6 options=201<PERFORMNUD,DAD>
    media: autoselect
    status: active

I do get the correct IP address of the host when using tart ip --resolver=arp.

i guess i should therefore re-report this either against tart or gitlab-tart-executor. which one?

edigaryev commented 9 months ago

i guess i should therefore re-report this either against tart or gitlab-tart-executor. which one?

That's unlikely a Tart's issue, because it never changes the default subnet for the NAT network.

Are you running some other VM virtualization solution on the same host?

umlaeute commented 9 months ago

yes (as in: qemu is also running on the host), that is what i summerized under $reasons (and I cannot change those $reasons, so not being able to run tart in parallel with qemu would be a show-stopper)

edigaryev commented 9 months ago

@umlaeute which network-specific settings are you passing to the QEMU?

Would appreciate if you could post a way to reproduce your issue.

umlaeute commented 9 months ago

to be honest, i don't really know (as the qemu process is required (and started) by a different user).

Here's the cmdline that starts qemu (as reported by ps aux):

/Library/Application Support/com.canonical.multipass/bin/qemu-system-aarch64 -machine virt,gic-version=3 -accel hvf -drive file=/Library/Application Support/com.canonical.multipass/bin/../Resources/qemu/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on -cpu host -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:ce:b0:a6 -device virtio-scsi-pci,id=scsi0 -drive file=/var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/ubuntu-22.04-server-cloudimg-arm64.img,if=none,format=qcow2,discard=unmap,id=hda -device scsi-hd,drive=hda,bus=scsi0.0 -smp 1 -m 1024M -qmp stdio -chardev null,id=char0 -serial chardev:char0 -nographic -cdrom /var/root/Library/Application Support/multipassd/qemu/vault/instances/primary/cloud-init-config.iso

so i figure the short answer to your question is -nic vmnet-shared,model=virtio-net-pci,mac=52:54:00:ce:b0:a6

edigaryev commented 9 months ago

One more thing, can you check if there are more than one entry in the /var/db/dhcpd_leases for the MAC-address of your macOS VM? E.g. like this:

% grep -C5 '6a:fc:68:76:eb:72' /var/db/dhcpd_leases
    lease=0x654114d1
}
{
    name=ubuntu
    ip_address=192.168.64.2
    hw_address=1,6a:fc:68:76:eb:72
    identifier=1,6a:fc:68:76:eb:72
    lease=0x65465a79
}
umlaeute commented 9 months ago

yes, there are (a number of) duplicates:

% grep -B3 -A3 "hw_address=1,6:b9:1c:70:ab:94$" /var/db/dhcpd_leases
{
    name=adminsVlMachine
    ip_address=192.168.67.5
    hw_address=1,6:b9:1c:70:ab:94
    identifier=1,6:b9:1c:70:ab:94
    lease=0x6565a345
}
--
{
    name=adminsVlMachine
    ip_address=192.168.66.5
    hw_address=1,6:b9:1c:70:ab:94
    identifier=1,6:b9:1c:70:ab:94
    lease=0x6560d808
}

for one MAC i have even a triplicate, but that goes under name=primary, so it might be unrelated (i guess the tart VMs all show up under name=adminsVlMachine)

edigaryev commented 9 months ago

Please check out the new 2.4.0 release, it should filter out the duplicate DHCP leases that are expired.

umlaeute commented 9 months ago

did a quick test today:

test setup

tool version
tart 2.4.0
gitlab-tart-executor 1.5.0-0cc5be4
gitlab-runner 16.6.1

result

all my VMs start (repeatedly) and can be accessed by gitlab-runner.

:tada:

thanks a lot for the quick fix.