faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

Using the --continue option from spades #291

Open aureliendejode opened 1 year ago

aureliendejode commented 1 year ago

Hello,

I am using phyluce to analyze UCE data and some of my assemblies using spades are taking quite long and are being stopped on the cluster I use because of a time limit.

I saw that for spade there is a --continue option for the command that allows to restart an assembly from the last check point.

Is there anything equivalent to this when using phyluce_assembly_assemblo_spades ?

Thanks for your help

Aurélien

brantfaircloth commented 1 year ago

There is not because it can be difficult to sync up. You could potentially (manually) restart spades on the sample(s) that have failed. The command you will need to modify to restart is:

    # we're using "single-cell" mode here due to coverage variance
    # spades.py --careful
    #   --sc
    #   --threads <your value>
    #   --memory <your RAM>
    #   --cov-cutoff auto
    #   --pe1-1 <path>
    #   --pe1-2 <path>
    #   --pe1-s <path>
    #   -o assembly

You might also consider downsampling your reads for each taxon. Usually, failure to complete a job is a result of their being tons and tons of reads that Spades is trying to correct, and that correction takes a long time. I'm not sure of the limits on your HPC (time-wise)...

I also tend to divide jobs up in to batches, so instead of running, say, 72 taxa in a single run (almost always fails at some point), i'll start 9 runs of 8 taxa each (almost never fails, particularly w/ downsampled reads).

Hope that helps a bit, -b