ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
513 stars 112 forks source link

Runtime estimation #65

Open anandksrao opened 5 years ago

anandksrao commented 5 years ago

Under the subheader System Requirements, it says;

For primate-sized genomes (3 gigabases each), you should expect Cactus to use approximately 120 CPU-days of compute per genome, with about 120 GB of RAM used at peak. The requirements scale roughly quadratically, so aligning two 1-megabase bacterial genomes takes only 1.5 CPU-hours and 14 GB RAM

If I were to try and align 3, 4 or 5 genomes, each ~ 300-400MB in size, could you please explain how you arrive at your best guesstimates for RAM and run time in cpu hours. The quadratic scaling explanation is not quite clear in this example, hence this request.

Also, is there a sense of how repeat content % in genome inputs may influence these calculations?

Thanks!

amizeranschi commented 5 years ago

Hey @anandksrao

I have some runtime info (for smaller genomes, for testing) for a 5-genome alignment in this post and the one after it: https://github.com/ComparativeGenomicsToolkit/cactus/issues/63#issuecomment-475567776.

It would help a lot if you had anything similar to share.