Closed mzur closed 5 years ago
This should be an extension of the biigle/laravel-remote-queue package, called biigle/remote-queue-openstack.
Maybe do not boot up and delete instances. Instead, use existing instances and suspend/resume them as needed. This should be faster and much easier to configure as the machines don't need any provisioning scripts or some such. Suspending instances seems to free up resources, too, which is what we want here.
However, we might need to extend php-opencloud/openstack to support this.
I implemented biigle/laravel-cached-openstack which makes it possible to share cached authentication tokens between different packages (e.g. biigle/laravel-image-cache). The OpenStack queue should use this package, too.
I opened php-opencloud/openstack#271 which implements resuming and suspending of instances.
This won't be implemented for now. The php-opencloud repo seems to be inactive and we wouldn't have a huge benefit with automatically suspending/resuming instances (as nobody has complained about us blocking GPUs so far).
The OpenStack queue (as described here) could have basic load balancing capabilities. This means that, if an OpenStack compute instance running the BIIGLE GPU server is currently busy, the queue boots up another (up to n) compute instance and submits new jobs round robin to all available instances. Each instance is deleted again if it is idle for a while.