Closed eecsmap closed 3 months ago
I had a small health check on the client side and noticed I was restarting the orchard worker due to the lack of a heartbeat. Turns out it was due to pulling an updated image layer. This would be a welcome improvement!
Please check out the new 0.16.1
version that was just released, it now features asynchronous VM creation that should help with the issue you're experiencing.
I update the controller with 0.16.1
orchard --version
orchard version 0.16.1-510a259
And created test_async. Yet noticed the heatbeat of worker 004 is still paused by the pending VM.
orchard list vms
Name Created Image Status Restart policy Assigned worker
test_async 2 minutes ago oci-registry-dev-local/macos-ventura-vanilla pending OnFailure (0 restarts) mac-studio-004.local
dev@macstudio001 ~ % orchard list workers
Name Last seen Scheduling paused
mac-studio-001.local 1 second ago false
mac-studio-002.local 3 seconds ago false
mac-studio-003.local 2 seconds ago false
mac-studio-004.local 2 minutes ago false
I update the controller with 0.16.1
Could you please update the worker too?
We probably should clarify this better on the release notes.
works! Thanks.
I have my Jenkins dynamically allocate VM instance on-demand. When a VM allocated to a worker, and the VM is not in the worker's cache, the worker starts pulling the image, which is fine. However, since the image is a bit large (~50GB), it will take more than 3 minutes to pull. And during this period of time, the image is marked as pending, and the worker stops sending heart beat. After 3 minutes, the worker will be considered disconnected and the VM will be marked as failed.
So, can we put the heart beat in a separate thread, and add status
pulling
for VM?