Closed carolina671 closed 5 years ago
derepFastq
only deals with exact matching between sequences. The dada
command generates the denoised ASVs. You can tell that command to ignore real differences less than some threshold with the MIN_HAMMING
parameter, e.g. dada(..., MIN_HAMMING=2)
requires at least 2 differences to split ASVs, hence would group together real ASVs that differ in just one place. But tread carefully. You could also consider clustering the output of dada2 with another tool to make traditional OTUs.
Thanks, I think is not a good idea to merge this ASV because if they are both highly abundant I would say that they can be SNIPs. Could you please recommend any package to make traditional OTUs?
In R, you can use the DECIPHER package to group ASVs into OTUs, see for example the IdClusters function. Does @digitalwright know of any DECIPHER OTU workflows out there?
You can also consider outside packages such as usearch/mothur/qiime and the like if your goal is OTUs.
Yes, it is straightforward to generate OTUs with the DECIPHER package. There are two ways to do it, both of which use the IdClusters
function and start from unaligned ASVs in the form of a DNAStringSet
.
(1) Cluster the sequences directly (inexact but scales well to tens of thousands of sequences):
otus <- IdClusters(myXStringSet=dna, method="inexact", cutoff=0.03)
(2) Cluster via a distance matrix (exact but scales approximately quadratically):
DNA <- AlignSeqs(dna, processors=NULL) # there are many variants of this (see documentation)
d <- DistanceMatrix(DNA, processors=NULL)
otus <- IdClusters(d, method="complete", cutoff=0.03, processors=NULL)
I hope that helps!
Thanks for all the info
Hi,
I have used the dada2 pipeline to process sequencing data from a gene different to 16S called nifH. This gene can be found in multiple copies and can be transfered horizontally, therefore two species can share the exact same sequence and it not very conserved.
I would like to know if there is a way of modifying the derepFastq command in order to make it less stringent when grouping the ASV. For example, in the OTU table i have found ASV sequences that differ only by one base (according to NCBI nBLAST). Therefore I would like to know if there is a way that this sequence can be classified as only one ASV, . Please see the two sequences below:
AAGCACCACCACACAAAACACAGTAGCCGGATTGGCTGAAATGGGCAGAAAAGTAATGGTTGTAGGATGCGACCCAAAAGCAGACTCTACACGTTTACTCCTTCATGGGTTGGCACAAAAAACAGTATTGGATACACTTCGCGACGAAGGCGAAGATGTTGAATTGGACGACGTAATGAAAGAAGGATTTAAAAACACCAGCTGTGTGGAATCCGGCGGTCCGGAACCGGGCGTTGGTTGTGCAGGCCGTGGTATTATCACTTCTATCAACCTTTTGGAACAACTTGGCGCTTACGATGCTGATAAACGACTTGATTACGTATTTTACGATGTACTTGGCG
AAGCACCACCACACAAAACACAGTAGCCGGATTGGCTGAAATGGGCAGAAAAGTAATGGTTGTAGGATGCGACCCTAAAGCAGACTCTACACGTTTACTCCTTCATGGGTTGGCACAAAAAACAGTATTGGATACACTTCGCGACGAAGGCGAAGATGTTGAATTGGACGACGTAATGAAAGAAGGATTTAAAAACACCAGCTGTGTGGAATCCGGCGGTCCGGAACCGGGCGTTGGTTGTGCAGGCCGTGGTATTATCACTTCTATCAACCTTTTGGAACAACTTGGCGCTTACGATGCTGACAAACGACTTGATTACGTATTTTACGATGTACTTGGTG
Thanks,