Paper: Frontiers | Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost? | Frontiers in Neuroinformatics

URL: https://www.frontiersin.org/articles/10.3389/fninf.2017.00063/full

This paper does...

Describes and prescribes appropriate use-cases for using Amazon Web Services in the context of neuroimaging applications. The authors also created and made available a tool for estimating the cost of workflows based on some basic workflow features.

| Figure 1: Clusters are always more expensive than Amazon, if you have to pay for them, and dedicated workstations are only cheaper if you have good up-time. Not including the cost of power/internet/etc.

The cost of storage is considerably higher in Amazon than locally, if you have hard-drives.

| Figure 2: It takes much longer to run things serially :)

The use cfncluster on AWS for benchmarking costs - this is an alternative to Batch which is not obviously better for pipeline deployment, but has added flexibility to support services (such as ECS tasks).

| Figure 4, 5, 6, 7: c4.xlarge instances seem to be best bang/buck. m4.large has strangely high variance.

| Table 2: Workstations are faster than AWS.

| Figure 8, 9: GPU acceleration makes things faster, but it is almost always more expensive.

Some tools for estimating benchmarking costs are here.

This paper does not...

Provide a way to deploy jobs on AWS (i.e. clowdr).
Address why m4 instances are so variable.

Additional Notes?

In the context of Clowdr, I should cite this paper in the discussion as an example way for evaluating the cost of deploying workflows on Amazon, prior to launching full-scale deployments in Clowdr either locally/clustered/on the cloud.

gkiar / reproreading