faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

phyluce_assembly_get_fastas_from_match_counts error - sequences longer than expected #145

Closed wjaratlerdsiri closed 5 years ago

wjaratlerdsiri commented 5 years ago

Hello,

I ran my split contigs with exon probes working until getting fasta files below. The log file was completed well. I used min-identity=95; min-coverage=95; regex='(ex-\d+)(_chr\d+)(_s\d+)(_e\d+.*)'.

source activate phyluce

for i in {0..30} do phyluce_assembly_get_fastas_from_match_counts \ --contigs exon_contig8g_set2 \ --locus-db exon_contig8g_set2-exonprobe.${i}/probe.matches.sqlite \ --match-count-output exon_contig8g_exonprobe.${i}-taxa-incomplete.conf \ --output exon_contig8g_exonprobe.${i}-taxa-incomplete.fasta \ --incomplete-matrix exon_contig8g_exonprobe.${i}-taxa-incomplete.incomplete \ --log-path log done

PROBLEM: I saw my exon_contig8g_exonprobe.${i}-taxa-incomplete.fasta getting sequences larger than probe size & lastz results.

Ex.

my probe name: ex-1652 (181bp)

ex-1652_chr10_s94852732_e94852913 GAAAACGGATTTGTGTGGGAGAGGGCCTGGCCCGCATGGAGCTGTTTTTATTCCTGACCT TCATTTTACAGAACTTTAACCTGAAATCTCTGATTGACCCAAAGGACCTTGACACAACTC CTGTTGTCAATGGATTTGCTTCTGTCCCGCCCTTCTATCAGCTGTGCTTCATTCCTGTCT

lastz (also 181bp; min-identity=95; min-coverage=95)

17236 >NODE_5776073_length_44302_cov_1933206 + 40935 41116 181 >ex-1652_chr10_s94852732_e94852913 - 0 181 181 ..................................................................................................................................................................................... 181M 181/181 100.0% 181/181 100.0%

exon_contig8g_exonprobe.0-taxa-incomplete.fasta

ex-1652_K3_FD00824591 |ex-1652 GTTACATATGTATACATGTGCCATATTGGTGTGCTGCACCCATTAACTCATCATTTAGCA TTAGGTATATCTCCTAATGCTATCCCTCCCCCCTCCCCCCACCCCACAACAGTCCCCGAA GTGTGATGTTCCCCTTCCTGTGTCCATGTGTTCTCATTGTTCAATTCCTGATTGTGGGCA TTTTAGCAAGATTATTGTCACTGGCCTTAAGCTCATGCCTCTTATTACTTCGTCTATCTG TCTGGAAATGGTACTGCTCTTCTTTGGAATGGTGTTTCATCATCTGTACATCAAAAGATT TAACTGCATGATTACCACTGTTTCTTAAACCTTCGTGACTTCTTTACAGCTCAGTTCACC ...(44,340 bp)

Could you please suggest me how this error happened? I like to make it work, as I am close to a final step now.

Thanks, James

brantfaircloth commented 5 years ago

Hi James,

Although you can use phyluce with any types of probes (e.g. exon and not uce), i can't support the multitude of ways that people do this so you are largely on your own. That said, you would generally expect larger contigs than your probes if the probes match a larger contig that was assembled by a program like trinity. so, the probe looks like it matches to this large contig perfectly, and you get the large contig when you extract the matching contigs from the assembled contigs. that's the expected behavior.