benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
470 stars 142 forks source link

Demultiplex and merge issues #177

Closed mentorwan closed 7 years ago

mentorwan commented 7 years ago

Hi Ben,

Thanks for excellent tools and troubleshooting! I have paired end samples with 2*125bp. I used split_libraries_fastq command in QIIME with quality score 19 to demultiplex samples. The Blast has been used to the first read in both forward and reverse sequence. I got perfect match for

Forward read: DSM 20617: 534 to 659 Reverse read: DSM 20617: 660 to 785.

I don't know what is the minimum requirement for MergePairs command? Do you have any recommendation for split_libraries_fastq command?

Also in early conversation, it was mentioned there is an option mergePairs(..., justConcatenate=TRUE). If using that, what downstream analysis are not working?

Thanks a lot!

benjjneb commented 7 years ago

Those reads don't overlap at all, so the regular mergePairs won't work (the default requirement is an overlap of 20 nts, but this can be changed: mergePairs(..., minOverlap=XX)).

You can use justConcatenate=TRUE. It will insert 10 Ns as padding between the F/R reads, but chimera removal and taxonomic assignment should still work largely as expected.

Other downstream applications, like constructing a phylogenetic tree between sequences, may have to be modified.

mentorwan commented 7 years ago

Thanks for suggestion! Now it seems that reads don't overlap. I got information about forward primer: 515F and reverse primer: 806R. But forward and reverse reads are 126bp. So in general, it will not overlap. I will try concatenate option.

But a curious question, why some reads got merged? I got the following messages when I run mergePair in standard way.

"2339 paired-reads (in 5 unique pairings) successfully merged out of 127429 (in 1379 pairings) input."

benjjneb commented 7 years ago

You can try BLAST-ing those sequences to see what they are.

Its not uncommon that non-target-length amplicons arise from non-target sources, such as chloroplasts or eukaryotes.