4dn-dcic / tibanna

Tibanna helps you run your genomic pipelines on Amazon cloud (AWS). It is used by the 4DN DCIC (4D Nucleome Data Coordination and Integration Center) to process data. Tibanna supports CWL/WDL (w/ docker), Snakemake (w/ conda) and custom Docker/shell command.
MIT License
70 stars 27 forks source link

Costing of Tibanna Runs #264

Closed tinyheero closed 4 years ago

tinyheero commented 4 years ago

Hi there,

Thanks for this software. I've been experimenting with it using Snakemake workflows and trying to get my head around how it works. If I understand correctly, what is happening under the hood:

  1. Each rule/job is converted into an AWS lambda function
  2. A set of lambda functions are strung together using AWS step functions
  3. Each lambda function will launch an EC2 instance, download the files it needs from S3, run the job/rule, upload the files to S3, and then terminate the instance.

In this way, it provides serverless computing as you don't need to provision EC2 instances before running the Snakemake workflow.

What is not clear to me is whether the charge for the EC2 instance is by the hour or by run-time, which is the major benefit of AWS batch? If it's by the hour, wouldn't Tibanna runs be expensive as each job requires the launching of a new EC2 instance?

SooLee commented 4 years ago

Hi @tinyheero Sorry I just saw the message. You're right about Tibanna, except that the lambda functions are preconfigured and already set up on AWS. What changes is the input that we pass to the lambda functions, which include information about EC2 instance configuration.

The instances are charged by the second. I guess some of the advantages of Tibanna could be that it gives you more flexibility in optimizing your jobs - every job can have its own EC2 configuration including instance type and EBS size.

tinyheero commented 4 years ago

Thanks for this information.