4dn-dcic / tibanna

Tibanna helps you run your genomic pipelines on Amazon cloud (AWS). It is used by the 4DN DCIC (4D Nucleome Data Coordination and Integration Center) to process data. Tibanna supports CWL/WDL (w/ docker), Snakemake (w/ conda) and custom Docker/shell command.
MIT License
70 stars 28 forks source link

Large amounts of NAT gateway costs #397

Closed laurentiush closed 1 year ago

laurentiush commented 1 year ago

Hi, I am using Tibanna together with Snakemake and it seems that I am incurring large amounts of NAT gateway costs. It seems to be coming from EC2 instances downloading tibanna and snakemake everytime that an EC2 instance is created.

Is there a way to provide those packages/containers from my local cloud environment instead of taking them from the internet? Like this using tibanna and snakemake is prohibitively expensive.

laurentiush commented 1 year ago

I think I solved it by pushing the tibanna docker to my ECR and pointing tibanna to it by using the awsf_image parameter.

willronchetti commented 1 year ago

That is the solution that we use internally to run Tibanna at high scale, yes. We haven't used Dockerhub in some time not just for this reason but others as well.

laurentiush commented 1 year ago

Ah, good to hear that you are using that solution as well. I am having a similar issue with conda environments. Is it possible to provide a 'local' snakemake conda environment instead of downloading it from bioconda/conda-forge every time?

Providing an s3 bucket as --conda-prefix does not seem to work.

willronchetti commented 1 year ago

We publish all Tibanna steps as Docker images on ECR, so we eliminate such redundant cost by building once, pushing to ECR and executing with Tibanna + CWL for example pulling down the Docker images to execute jobs.