Closed schmittel closed 8 months ago
Hi @schmittel. I see. The issue is really coming down to the combination of --SkipMash
and --S_algorithm ANImf
. When you skip Mash, it's going to require 1500x1500 = 2,250,000 genome comparisons. So even if each comparison is relatively quick, the run is going to take a long time (4 days seems about what I would expect).
In this case the main thing to do is to adjust to --S_algorithm fastANI
. It's about 10 times faster than ANImf, so your run should take about 10% as long. You could also remove --run_tertiary_clustering
in this scenario, since with --SkipMash
it probably isn't impacting things much anyways.
Best, Matt
Hi there,
I'm having an issue where it's taking ~4 days to dereplicate 1500 bacterial assemblies. I have many batches consisting of these ~1500 assemblies so overall this is going to take way too long. Given your knowledge of the different programs run by dRep and their efficiencies, I'm wondering whether you could offer any advice for optimizing the cluster jobs that I am submitting? Here are the parameters I am currently working with:
Do you have any suggestions for adjustments that might be specifically optimal for dRep?
Here's my dRep command:
Many thanks