faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Error in "phyluce_assembly_match_contigs_to_probes" #272

Closed kq1986 closed 2 years ago

kq1986 commented 2 years ago

Hello!

I'm trying to match my contigs (~1.9G, assembled using Spades) to my UCE probe set using "phyluce_assembly_match_contigs_to_probes", but I get the following error:

Traceback (most recent call last): File "/public/home/kq/miniconda3/envs/phyluce/bin/phyluce_assembly_match_contigs_to_probes", line 421, in main() File "/public/home/kq/miniconda3/envs/phyluce/bin/phyluce_assembly_match_contigs_to_probes", line 345, in main raise EnvironmentError("lastz: {}".format(lztstderr)) OSError: lastz: b'FAILURE: in new_position_table(), prev[] array size (7,512,212,992) exceeds allocation limit of 4,294,967,279; consider using lastz_32, or setting max_malloc_index for a special build, or breaking your target sequence into smaller pieces\n'

I guess this is due to the contigs file is too big. Is there any good suggestion to troubleshoot this?

Thanks very much!

brantfaircloth commented 2 years ago

This is an error that is reported by lastz when the sequence data you are inputting are too large for the way that lastz is designed. A 1.9 G contig file is very large if you have performed standard UCE enrichments... so, I am not sure how that size could be correct, unless what you really have is a WGS library that's been sequenced and in which you are trying to identify UCE loci.

If this is the case (you are identifying UCE loci in WGS libraries), follow Tutorial 3, and that should fix the problem. You could also follow this tutorial if you really do have a library that was enriched... and the resulting contig file is just enormous (although I don't quite understand how that could be and it suggests something about the enrichment was off).

Finally, you could split your contig file into 1/3s or 1/4ths (or so) and follow the standard protocol. Then, combine the UCE loci that are identified in all 3-4 runs into a single file, and run that single file back through the UCE identification process again.

kq1986 commented 2 years ago

Hi Brant,

I followed the Tutorial 3 and that have fixed my problem. Thank you very much for the quick response!!

brantfaircloth commented 2 years ago

Cool - glad to hear it!