cirruslabs / orchard

Orchestrator for running Tart Virtual Machines on a cluster of Apple Silicon devices
Other
192 stars 15 forks source link

Failing to launch VM with error "The maximum supported number of active virtual machines has been reached" #138

Closed ruimarinho closed 11 months ago

ruimarinho commented 11 months ago

Hi,

I was surprised to see many VMs failing to launch today due to the following error:

{"level":"error","ts":1695812933.623549,"msg":"VM failed: tart command returned non-zero exit code: \"Error Domain=VZErrorDomain Code=6 \\\"The maximum supported number of active virtual machines has been reached.\\\" UserInfo={NSLocalizedFailure=The number of virtual machines exceeds the limit., NSLocalizedFailureReason=The maximum supported number of active virtual machines has been reached.}\"","vm_uid":"d6871246-d5d7-454b-ad70-f75e4b512264","vm_name":"01HBB57MDZ6ZX0H511B51FEZ8Z","vm_restart_count":0}

Initially this seemed suspicious as the host is set to run two tart VMs at most, but looking at running VMs, I see two of them with an "Application Not Responding" flag, which is a first for me ever since we've switched to headless mode.

Force quitting one of the VMs allowed Orchard to start launching new VMs, but I guess my idea here would be to:

1) Force quit VMs if they become unresponsive to allow for an automated approach of cleaning stale VMs. 2) Investigate how and why VMs in headless mode can become non-responsive when running exactly the same code/payload as non-headless ones.

edigaryev commented 11 months ago

Hi Rui đź‘‹

Thanks for reporting this, I think I've nailed the potential errors that might result in this behavior in https://github.com/cirruslabs/orchard/pull/139 and the new release 0.13.1 that includes these changes will be available shortly.

ruimarinho commented 11 months ago

Yep, already in the process of upgrading :) thank you for investigating this. 0.13.0 feels much more solid already!