Closed steinbachr closed 4 years ago
There are Docker images out on DockerHub we could use if we choose to go that route. Here's one: https://hub.docker.com/r/johnktims/folding-at-home/
Need to compare EC2 versus Fargate in terms of billing predictability and the scaling/scheduling engine
agree, however I think due to time constraints we forge forward with EC2. If we end up having a day to spare, we can circle back to the infrastructure and see about running testing for EC2 v. Fargate.
This template creates a launch template and an autoscaling group: https://github.com/jkataja/cfn-foldingathome
@jkataja thanks, I referenced your project on #1. What you've done is great and I don't really want to change a thing about it, just design a scaling algorithm around it.
Here are my ideas and assumptions for MVP:
Other technical features:
I am 100% on-board. Only one point of clarification, you mean SQS messages not SNS, yeah?
Yep, typo
Splitting off the last monitoring point to #6
@atkinsonm glad you find it useful! The easiest way of scaling would be to change the auto scaling group size. It is controlled by stack parameter, and the easiest would be to do a CloudFormation stack update which changes the scaling. The template installs a bunch of stuff in instance initialization from the user data script. I did not create an AMI to make the template easy to use in different accounts and also to avoid license issues with commercial NVidia CUDA drivers and the Folding@home client. If everything is contained within a single account, then the scaling up would be much faster with a pre-baked AMI containing all the software already installed.
Also, the template uses smallest G4 instance size g4dn.xlarge to keep the costs low. For eu-north-1 on-demand is 0.5580 with cheapest spot at 0.1674. Also I assume the Folding@home client is best tested with one GPU present, didn't want to work with multiple GPU issues.
Just checked now, GPU task took 2 hours 12 minutes (from 11:43 to 13:55) to run. Edit: CPU task took 5 hours 28 minutes (from 04:24 to 9:52) to run.
The software can run anywhere in the world. The optimal solution would take pricing differences in the account and run it in the cheapest availability zones, possibly even cheapest regions.
Marking as a duplicate of #1
This will be a requirement for #2 .
We have two options here, and I'm not yet sure which is the right one:
Option 1
For this, we back compute with EC2 resources and an ASG, so we'll need:
Option 2
We back compute with Fargate resources. A pre-requisite here is that we can get folding@home working properly in a container and the container can be juiced with enough access to the host compute resources: