Open karlyhiggins opened 2 years ago
When I have lots of assemblies to run, I usually split the input files into batches of something like 10-20 taxa, and then run those across different nodes in parallel. Since it looks like you are using slurm, you could look into using job arrays which might work well for your use-case. I also tend to randomly downsample (using seqtk
) the R1 and R2 input files to ~2-3 million reads each (for the tetrapod bait set) prior to assembly, which makes things go MUCH faster (10 million reads is a lot).
That said, we have found that inputting more reads than this can sometimes be beneficial for the assembly of UCE contigs from toepads and/or other historical sources.
Hello,
I am utilizing tutorial 1 to process my reads from 170 individuals. The reads can be upwards of 10million for some individuals and I am stuck on the assembly step taking a long time. It seems to take around a full day for one individual. I am running phyluce on a HPC and all appears to be working correctly. I was wondering if there are any suggestions for speeding it up? My only thought has been to split the run so I can submit multiple individuals at once.
Here is the example of my submission script, I can submit up to 20 cores per node and up to 10 nodes.
SBATCH --nodes=1
SBATCH --ntasks=20
SBATCH -p long.q
SBATCH --mem=56G
SBATCH --time=0-120:00:00
SBATCH --job-name=sym_phyluce
SBATCH --export=ALL
source /home/khiggins/miniconda3/etc/profile.d/conda.sh conda activate phyluce-1.7.1
phyluce_assembly_assemblo_spades --conf assembly.conf --output spades-assemblies --memory 56 --cores 20