Automattic / hostmgr

A tool for managing macOS VM hosts
Mozilla Public License 2.0
8 stars 3 forks source link

Fix the failed to unregister the VM error #69

Closed crazytonyli closed 8 months ago

crazytonyli commented 9 months ago

Issue

Starting last week, there has been many "failed to unregister VM" errors in our CI jobs. The error happens at the hostmgr vm cleanup command in our pre-exit agent hook:

~~~ Cleaning Up
Removing Registered VM xcode-15.0
Error: ShellOut encountered an error
Status code: 255
Message: "Failed to unregister the VM: Unable to perform the action because the virtual machine is busy. The virtual machine is currently running. Please try again later."
Output: ""

Root cause

The error is thrown by prlctl unregister <vm> command. The command failed because the VM was in an invalid state where it's files had been deleted. I'm not entirely sure how Parallels ended up in that state though.

Changes

This PR adds an additional step to unregister all invalid VMs first before unregistering all VMs.

Please note: This PR is a patch on top of 0.15.13. Because our agents currently run 0.15.13 and the latest release 0.17.x appears to have issues on our agents (which is why https://github.com/Automattic/hostmgr/pull/68 failed).

Test Instructions

On your Intel Mac, launch a VM using Parallels Desktop. Right click the VM on Parallels Desktop's Control Center and click "Show in Finder". Delete the VM files (.pvm) while the VM is running. Check out this PR branch and run swift run hostmgr vm clean. The command should exit with zero and its output should say something like:

Found 1 invalid VMs
Killing running virtual machines
Unregistering invalid VMs
Removing Register VM ...
...

Next

After this PR is merged, I will create a new 0.15.14 release and deploy to all macOS agents. Also, I'll create a PR to cherry pick the bug fix commit to the trunk branch.

crazytonyli commented 8 months ago

I'll close this PR because we have a similar solution that's implemented in internal tools. ⬆️