Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean

kfatehi commented 9 years ago

A new Runner plugin that is most similar to the docker runner except that instead of creating a docker container, it uses the digitalocean API to spin up a server, wait for IP and shell access, and then unblock and allow things to happen via SSH as normal.

https://github.com/keyvanfatehi/saasbox-app/blob/master/src/workers/instance_provisioner/index.js#L35-L70

knownasilya commented 9 years ago

Awesome idea! I would love to tackle this, since I use DO for my personal projects.

garymcleanhall commented 9 years ago

@knownasilya If you want any help, let me know and I'd like to contribute.

knownasilya commented 9 years ago

@garymcleanhall I'm limited on time right now, so feel free to start :+1:.

Submit your PR's here: https://github.com/Strider-CD/strider-do-runner

If you plan on contributing, we can add you as a contributor there.

niallo commented 9 years ago

Just something to think about (more general than just DO) is how to manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike during the day and then is fairly idle overnight.

Any thoughts?

kfatehi commented 9 years ago

Niall I was thinking that parallel jobs trigger parallel VM's like how the Docker runner works. You still incur the same cost, just in a shorter period of time compared to serial. When a job completes the VM is destroyed. The admin must design a template for VM, indicating RAM/CPU/Price, etc. Not sure if i understood your concern fully though On Fri, Dec 19, 2014 at 10:51 AM niallo notifications@github.com wrote:

Just something to think about (more general than just DO) is how to manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike during the day and then is fairly idle overnight.

Any thoughts?

— Reply to this email directly or view it on GitHub https://github.com/Strider-CD/strider/issues/675#issuecomment-67680006.

niallo commented 9 years ago

Sure, I understand. This is great.

But booting VMs has a much greater overhead compared with spinning Docker containers.

It might take 5-15 minutes for them to be ready for jobs. Maybe on DO boxes come up a lot faster, but certainly AWS can easily take 15 minutes. And even if booting is super fast on DO, it's quite likely you'll have an expensive Puppet (or whatever) setup procedure.

Therefore, you want to have a way to keep them around while load is high. Otherwise you keep taking the startup overhead.

kfatehi commented 9 years ago

Actually, I see, since each run costs money, a shop may not find it worth it to run every build that accumulates during a spike. On Fri, Dec 19, 2014 at 10:55 AM Keyvan Fatehi keyvanfatehi@gmail.com wrote:

Niall I was thinking that parallel jobs trigger parallel VM's like how the Docker runner works. You still incur the same cost, just in a shorter period of time compared to serial. When a job completes the VM is destroyed. The admin must design a template for VM, indicating RAM/CPU/Price, etc. Not sure if i understood your concern fully though On Fri, Dec 19, 2014 at 10:51 AM niallo notifications@github.com wrote:

Just something to think about (more general than just DO) is how to manage usage-based scaling (or auto-scaling) of worker boxes.

This is basically a must-have for larger shops, where job load can spike during the day and then is fairly idle overnight.

Any thoughts?

— Reply to this email directly or view it on GitHub https://github.com/Strider-CD/strider/issues/675#issuecomment-67680006.

niallo commented 9 years ago

Yes, that's another issue - on some providers (EC2 being the big one) you pay a minimum of one hour per box.

kfatehi commented 9 years ago

Right, forgot about that. DO is also 1 hour per box. Perhaps that should be built into the plugin then as a baseline for reuse... On Fri, Dec 19, 2014 at 10:59 AM niallo notifications@github.com wrote:

Sure, I understand. This is great.

But booting VMs has a much greater overhead compared with spinning Docker containers.

It might take 5-15 minutes for them to be ready for jobs. Maybe on DO boxes come up a lot faster, but certainly AWS can easily take 15 minutes. And even if booting is super fast on DO, it's quite likely you'll have an expensive Puppet (or whatever) setup procedure.

Therefore, you want to have a way to keep them around while load is high. Otherwise you keep taking the startup overhead.

— Reply to this email directly or view it on GitHub https://github.com/Strider-CD/strider/issues/675#issuecomment-67681022.

garymcleanhall commented 9 years ago

Atlassian Bamboo, which uses AWS, takes 15 mins to spin up a box and its JVM to run. It allows you to configure it to keep a box around for a while, so something similar to that makes sense.

Provisioning is the question mark for me. You can use the DO API to spin a new box up, but you need puppet/chef to then install something meaningful on it. For chef, it might involve making the Strider server a knife workstation so that it can create a droplet, bootstrap it and the provision it (using chef-zero or a specified chef server)?

niallo commented 9 years ago

@garymcleanhall Right. Some provisioning step needed. I think this should be a generic shell script, with some sane minimal defaults.

Once Node is on the target machine, Strider can send its own code over SSH to be executed by the worker.

kfatehi commented 9 years ago

I want to target DO first because:

VM spin up in 55 seconds
VM can be spun up from a saved "image" which is preserved for you on your DO account at no charge
The API allows saving a machine to an image, starting a machine from an image, everything we need.

Then I would target OpenStack... Not AWS, someone else can do that, I stay away from it. Also prior to the blind reuse of my existing work with DO API it is worth investigating pkgcloud. If pkgcloud supports the subset of functionality we require then one can pick any cloud. It just seems wiser to focus on DO first.

I think a user needs to be told to create an image, or the plugin guides this process. Thereafter the image should be selected from a dropdown for use on all future builds.

On Friday, December 19, 2014, niallo notifications@github.com wrote:

@garymcleanhall https://github.com/garymcleanhall Right. Some provisioning step needed. I think this should be a generic shell script, with some sane defaults.

Once Node is on the target machine, Strider can send its own code over SSH to be executed by the worker.

— Reply to this email directly or view it on GitHub https://github.com/Strider-CD/strider/issues/675#issuecomment-67690516.

Strider-CD / strider

Plugin Idea: Dynamic temporary virtual machine workers via DigitalOcean #675