faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

Help: Assembly stuck on "Processing contigs" #327

Open chloejelley opened 9 months ago

chloejelley commented 9 months ago

I am currently attempting to run the assembly step of the phyluce workflow using large server resources on my university campus. I first attempted with a small subset (5) of my samples using the following code: phyluce_assembly_assemblo_spades \ --conf assembly_outgroup.conf \ --output spades-assemblies-outgroup \ --cores 64 \ --memory 500

This took 79 hours to finish 4 samples and then the 5th sample was stuck on "Processing contigs" for almost 12 hours, at that point I stopped everything.

Then I tried again on a bigger server with 10 samples and not a single sample had finished assembling after 22 hours. The sample was stuck on "Processing contigs" again. This is how I set it up: phyluce_assembly_assemblo_spades \ --conf /home/cmj96/Desktop/IridoUCE/assembly1.conf \ --output /home/cmj96/Desktop/IridoUCE/spades-assemblies1 \ --cores 88 \ --memory 1024

Am I setting this up incorrectly? Is there such thing as using too many cores? Any insight you could provide would help!

brantfaircloth commented 9 months ago

This sounds like it's taking (much) longer than it should. How much data do you have for each of these individuals and what type of data are these (e.g. from the avian bait set, etc)? Are you able to check on server utilization while the run is ocurring?

Have you tried to run spades on a single sample outside of phyluce? That might be a good place to start troubleshooting - my guess is that the data going into spades may be causing the issue (but really hard to say - just guessing).

chloejelley commented 9 months ago

I am using the Hymenopteran bait set! According the the phyluce_assembly_get_fastq_lengths script that you provide in the phyluce tutorial, my samples range from 1019-11496026 reads and 119584-1177959990 base pairs (so quite a large range). I will start another run focusing on one of my larger samples and check on the server utilization. I will also try out running spades outside of phyluce. I do think this issue more on the spades side. Thank you!

brantfaircloth commented 9 months ago

11 M reads is a lot - but not crazy given the amount of RAM you have available. One other thing to try would be to randomly downsample some of your larger files to see how they assemble w/ fewer reads. You can use a program like seqtk to do this (shooting for something like 2-3 M reads per sample).