ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
250 stars 32 forks source link

question about cloud budget #276

Open charlie2021cc opened 7 months ago

charlie2021cc commented 7 months ago

Hello,I noticed that your cloud budget records in wiki. "Starting late in the evening on January 11th and running Serratus casually for the next 11 days (non continuous use, at ~80% of it's maximum capacity to favour stability over performance) we complete a search of 5,686,715 sequencing libraries (10.2 petabases). The total cost of a full ground-up re-analysis was $23,980 or $0.0042 per library. This value reflects the current state-of-the-art for Serratus, and to the best of our knowledge any means of ultra-rapid access to petabases of sequencing data."

I have constructed a test cluster with 5 EC2 instances and have completed testing of 10,000 libraries. In terms of costs, just for the initial download phase, my expenses have reached $0.03 per library, which is ten times your cost. Therefore, could you share the details of your cluster configuration? Specifically, I am interested in the EC2 types, the number of instances, network bandwidth, and the degree of task parallelism per instance.

ababaian commented 7 months ago

Yeah it's all in the terraform: https://github.com/ababaian/serratus/blob/master/terraform/main/main.tf and we report the specifics of the cluster in the paper.

Also depending on what your query fasta file and type of libraries you're analyzing are, cost is going to vary substantially based on the hit-rate of your seed kmers in bowtie/diamond.