cirruslabs / orchard

Orchestrator for running Tart Virtual Machines on a cluster of Apple Silicon devices
Other
192 stars 15 forks source link

Orchard keeps trying to delete a tart VM that doesn't exist - and failing #120

Closed roblabla closed 12 months ago

roblabla commented 12 months ago

I have a VM that failed to create in tart due to using an image that doesn't exist (nice typo there). In this case, the tart VM was not created, and the VM was put in the failed state in orchard. However, when trying to remove it, orchard keeps trying to run tart delete orchard-roblabla-36a11586-430f-4533-a11b-13bf178a7f20-0, failing because the VM never existed in the first place, and trying again a couple seconds later. Full logs:

{"level":"info","ts":1694421867.788933,"msg":"registered worker pineapple"}
{"level":"info","ts":1694421867.7902699,"msg":"syncing on-disk VMs..."}
{"level":"info","ts":1694421867.817597,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"warn","ts":1694421867.832927,"msg":"'tart stop orchard-roblabla-36a11586-430f-4533-a11b-13bf178a7f20-0' failed with exit code 1: the specified VM \"orchard-roblabla-36a11586-430f-4533-a11b-13bf178a7f20-0\" does not exist","vm_uid":"36a11586-430f-4533-a11b-13bf178a7f20","vm_name":"roblabla","vm_restart_count":0}
{"level":"warn","ts":1694421867.848404,"msg":"'tart delete orchard-roblabla-36a11586-430f-4533-a11b-13bf178a7f20-0' failed with exit code 1: Error: “orchard-roblabla-36a11586-430f-4533-a11b-13bf178a7f20-0” couldn’t be removed.","vm_uid":"36a11586-430f-4533-a11b-13bf178a7f20","vm_name":"roblabla","vm_restart_count":0}
{"level":"warn","ts":1694421867.848452,"msg":"failed to sync VMs: VM failed: failed to delete VM: tart command returned non-zero exit code: \"Error: “orchard-roblabla-36a11586-430f-4533-a11b-13bf178a7f20-0” couldn’t be removed.\""}
{"level":"warn","ts":1694421867.848562,"msg":"failed to watch RPC: rpc error: code = Canceled desc = context canceled"}

This also seems to cause the entire RPC to break down.

I believe the code responsible for deleting VM should accept failure if the resulting VM is deleted from the list of tart VMs.