cirruslabs / orchard

Orchestrator for running Tart Virtual Machines on a cluster of Apple Silicon devices
Other
192 stars 15 forks source link

Make (bridged) VM IP available in the controller API #123

Closed roblabla closed 2 months ago

roblabla commented 12 months ago

It'd be nice for custom tooling to have the IP address of the VM available in the API. Currently, the only way to interact with the VM for tooling is through the ssh port-forwarding, which is clunky to use.

edigaryev commented 12 months ago

Hello Robin,

I'm not yet sure if exposing the VM's IP in the API would be the right way to proceed because we cannot guarantee that this IP will be reachable to every client that can already access the API.

Do you mind elaborating a bit more on the clunkiness of the current SSH port-forwarding implementation, perhaps there is something we can improve there instead?

roblabla commented 12 months ago

Sure! Sorry, that issue was a bit succinct and lacking context - it was made in a bit of a hurry 😓

At $WORK, we have a custom gitlab-runner based on kubevirt to run various VM-based workloads, and are looking to integrate a similar orchard-based runner for macos arm64 VMs. To this end, I'm writing an abstraction layer that allows interacting with kubevirt and orchard using a common python API.

Our runner is written in python, and the way custom runners work in gitlab, is that you give gitlab different scripts to run for the various stages in the lifecycle of a gitlab job. It usually follows three stages: prepare -> run (multiple times) -> cleanup. So in a simple case, we'll end up with our programs being launched at least three times:

orchard-runner prepare
orchard-runner run path/to/script/to/run.sh
orchard-runner cleanup

The bulk of the runner relies on spawning ssh to connect to the VM and running the script. Connecting via IP:port to connect (instead of going through orchard ssh) is desirable for multiple reasons:

Because of this, I essentially have two choices:

There are essentially two approaches possible to the portforward approach in our runner (in our experience with kubevirt):

Each come with their own sets of upsides and downsides, but the gist is that it's a lot more work and things that can go wrong. Furthermore, while I don't know if orchard suffers from this, I had some interesting behaviors with long-running ssh connection going through the kubernetes port-forward proxy, where the connection would occasionally timeout.

Doing a direct connection to a bridged network involves less moving parts, and results in a more reliable system that's easier to debug when it goes wrong.


we cannot guarantee that this IP will be reachable to every client that can already access the API.

In my network, I know that if I bridge the address, it will be reachable to all the client that uses the API. While I understand that this isn't necessary the case for everyone, I still think it'd be helpful to return the IP. Kubevirt returns the IP, and they're in a similar situation where the IP may not be publicly reachable. I think it should be up to the network operator to set things up if they want the IP returned by the orchard cluster to be reachable.


FWIW I started work on doing this since I need it in the short-term (see this branch).

roblabla commented 11 months ago

Been using this patch in my poc cluster and it works nicely. Only downside is that if the IP of the VM ever changes, orchard will keep reporting the old one, it has no mechanism to detect a new IP. That's fine for my use-case (I don't expect my VMs to ever change IP), but may not work for other people. I don't think it's easily fixable though - AFAICT, the only way to reliably get notified when the IP changes is to have an agent running in the VM.

eecsmap commented 2 months ago

This feature is still good to have. I put some details in https://github.com/cirruslabs/orchard/issues/176#issuecomment-2203835609

edigaryev commented 2 months ago

Please check out the new 0.22.0 release, it exposes a new GET /v1/<VM name>/ip endpoint that will resolve the actual VM's IP on the worker.

Both the controller and workers need to be updated for this to work.

eecsmap commented 2 months ago

Verified, It works as expected! Thanks!