First of all, thank you for implementing such a great tool and responding to all these issues!
While running the pipeline on 48 soil samples of 16S V3-V4 amplicons, I became slightly worried about the sequence length distribution of my merged reads (after removing chimeras with removeBimeraDenovo). Firstly, the merged sequences length distribution is wide, ranging from 269 to 480 nts in length. Also, it appears that most merged sequences have a length of either ~402bps or ~425bps.
Plotting this information with:
z <- table(nchar(getSequences(seqtab.nochim)))w <- as.data.frame(z, col.names = c('sequence length','frequency'))plot(w)
First of all, thank you for implementing such a great tool and responding to all these issues!
While running the pipeline on 48 soil samples of 16S V3-V4 amplicons, I became slightly worried about the sequence length distribution of my merged reads (after removing chimeras with
removeBimeraDenovo
). Firstly, the merged sequences length distribution is wide, ranging from 269 to 480 nts in length. Also, it appears that most merged sequences have a length of either ~402bps or ~425bps.> table(nchar(getSequences(seqtab.nochim)))
Plotting this information with:
z <- table(nchar(getSequences(seqtab.nochim)))
w <- as.data.frame(z, col.names = c('sequence length','frequency'))
plot(w)
I get the following graph:
(sorry about the ugly graph)
Is this expected?
Extra Info:
Primers used: Pro341F (5′-CCTACGGGNBGCASCAG-3′) Pro805R (5′-GACTACNVGGGTATCTAATCC-3′)
Sequencing: Illumina MiSeq 2x300 bps