cirruslabs / orchard

Orchestrator for running Tart Virtual Machines on a cluster of Apple Silicon devices
Other
192 stars 15 forks source link

VMs with a slash in their name causes a lot of failures. #130

Closed roblabla closed 11 months ago

roblabla commented 11 months ago

I was trying to find some way to namespace VMs in my cluster, so I tried putting a / in the VM name. That led to some fun stuff - as the VM never started, and could not be deleted:

$ orchard create vm  --image ghcr.io/cirruslabs/macos-ventura-base:latest test/roblabla
$ orchard list vms
Name            Image                                               Status  Restart policy
test/roblabla   harbor.huruk.ai/vms-tart/vm-macos12-arm64:latest    pending Never (0 restarts)
$ orchard delete vm test/roblabla
2023/09/22 18:44:33 API client encountered an API error to make a request: 404 Not Found

The VM never spawned, because tart seems to not have liked the name with the slash in it, according to the worker log:

{"level":"warn","ts":1695400981.9457111,"msg":"'tart clone harbor.huruk.ai/vms-tart/vm-macos12-arm64:latest orchard-test/roblabla-985f4082-8677-4f8d-89ca-a3ef2bd1256f-0' failed with exit code 64: Error: <new-name> should be a local name","vm_uid":"985f4082-8677-4f8d-89ca-a3ef2bd1256f","vm_name":"test/roblabla","vm_restart_count":0}
{"level":"error","ts":1695400981.945843,"msg":"failed to clone the VM: tart command returned non-zero exit code: \"Error: <new-name> should be a local name\"","vm_uid":"985f4082-8677-4f8d-89ca-a3ef2bd1256f","vm_name":"test/roblabla","vm_restart_count":0}

Furthermore, the worker failed to put the VM in the failed state when talking to the controller, again probably due to the slash in the name:

{"level":"warn","ts":1695400981.947734,"msg":"failed to sync VMs: API client encountered an API error to make a request: 404 Not Found"}
{"level":"warn","ts":1695400981.948169,"msg":"failed to watch RPC: rpc error: code = Canceled desc = context canceled"}

This failure to communicate with the controller then caused the worker to enter a crash loop:

{"level":"info","ts":1695400986.916616,"msg":"registered worker pineapple"}
{"level":"info","ts":1695400986.918817,"msg":"syncing on-disk VMs..."}
{"level":"info","ts":1695400986.947737,"msg":"syncing 3 local VMs against 3 remote VMs..."}
{"level":"warn","ts":1695400986.94867,"msg":"failed to sync VMs: API client encountered an API error to make a request: 404 Not Found"}
{"level":"warn","ts":1695400986.9487,"msg":"failed to watch RPC: rpc error: code = Canceled desc = context canceled"}
{"level":"info","ts":1695400991.8975968,"msg":"registered worker pineapple"}
{"level":"info","ts":1695400991.8994482,"msg":"syncing on-disk VMs..."}
{"level":"info","ts":1695400991.9237912,"msg":"syncing 3 local VMs against 3 remote VMs..."}
{"level":"warn","ts":1695400991.924768,"msg":"failed to sync VMs: API client encountered an API error to make a request: 404 Not Found"}
{"level":"warn","ts":1695400991.9248068,"msg":"failed to watch RPC: rpc error: code = Canceled desc = context canceled"}
{"level":"info","ts":1695400996.9000902,"msg":"registered worker pineapple"}
{"level":"info","ts":1695400996.901927,"msg":"syncing on-disk VMs..."}
{"level":"info","ts":1695400996.9255302,"msg":"syncing 3 local VMs against 3 remote VMs..."}
{"level":"warn","ts":1695400996.926463,"msg":"failed to sync VMs: API client encountered an API error to make a request: 404 Not Found"}
{"level":"warn","ts":1695400996.926503,"msg":"failed to watch RPC: rpc error: code = Canceled desc = context canceled"}
edigaryev commented 11 months ago

That should be resolved once https://github.com/cirruslabs/orchard/pull/129 is merged.

roblabla commented 11 months ago

Is there any way I can repair my cluster, or should I just nuke it? Because it looks impossible to delete the "broken" VM, even going through curl doesn't work:

$ curl -u <redacted> -X DELETE https://myorchardcluster/v1/vms/test%2froblabla
404 page not found

The reason seems to be that Gin unescapes the Path very early, and then tries to match its URL parameters on the already-unescaped path, unless some options are set (see https://github.com/gin-gonic/gin/blob/c2ba8f19ec19914b73290c53a32de479cd463555/gin.go#L126 )