faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

Some sequences fail to assemble using phyluce_assembly_assemblo_spades? #167

Closed ymilesz closed 5 years ago

ymilesz commented 5 years ago

Hi Brant,

I am trying to use phyluce_assembly_assemblo_spades to assemble my clean fastq, and I noticed that some of the sequences fail with the following message and the program just moves onto the next one:

2019-08-27 07:35:42,750 - phyluce_assembly_assemblo_spades - INFO - ------------- Processing YMZ009_Diplolepis_bassetti ----------$ 2019-08-27 07:35:42,751 - phyluce_assembly_assemblo_spades - INFO - Finding fastq/fasta files 2019-08-27 07:35:42,770 - phyluce_assembly_assemblo_spades - INFO - File type is fastq 2019-08-27 07:35:42,770 - phyluce_assembly_assemblo_spades - INFO - Running SPAdes for PE data 2019-08-27 08:03:10,153 - phyluce_assembly_assemblo_spades - WARNING - Did not clean all fastq/fasta files from /ufrc/lucky/yuanme$ 2019-08-27 08:03:10,177 - phyluce_assembly_assemblo_spades - INFO - Symlinking assembled contigs into /ufrc/lucky/yuanmeng.zhang/S$

When I look at this sample's spades.phyluce.log, this is the error: == Error == system call for: "['/apps/phyluce/20190308/share/spades-3.12.0-1/bin/spades-hammer', '/ufrc/lucky/yuanmeng.zhang/Syco$

So something to do with the Bayes-hammer? But why doesn't this happen to every sample, only a few of them? I ran this entire batch through using Abyss and it looked fine?

Thanks,

Miles

brantfaircloth commented 5 years ago

Hi Miles,

It may be that BayesHammer is running into RAM limitations - that's usually what kills a job and the reason that some/most samples make it through is because they have fewer reads. To work around this problem, we usually subsample our reads for each taxon to something reasonable like 2 M read pairs per tip (this is for ~4.5k loci; without subsampling taxa w/ fewer reads). That way we generally ensure the run will complete for all tips.

-b

ymilesz commented 5 years ago

Thanks Brant,

I will try subsampling, I can just follow "Subsample reads for R1 and R2 using seqtk" on pg 47 of your lab doc?

brantfaircloth commented 5 years ago

That should get you pretty close. May need to adjust that little snippet for your needs.