benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

merging replicates of sequences using DADA2 and Phyloseq #1843

Closed Abdy83 closed 1 month ago

Abdy83 commented 8 months ago

Heya, I have two runs (rep1, rep2) of 16S amplicon sequencing for 40 samples. The samples in 'rep1' are named like '1062-1-sampleNo1.R1.fastq' through '1062-1-sampleNo40.R1.fastq' and '1062-1-sampleNo1.R2.fastq' through '1062-1-sampleNo40.R2.fastq'. In 'rep2', the samples are named as '1062-2-sampleNo1.R1.fastq' through '1062-2-sampleNo40.R1.fastq' and '1062-2-sampleNo1.R2.fastq' through '1062-2-sampleNo40.R2.fastq'.

I have already checked many different sources, but it was challenging to find a solution that fits my case. Based on what I saw on GitHub, the best approach seems to be preparing 'seqtab.nochim' separately for 'rep1' and 'rep2', and then merging them, correct?

My first question is how to merge them. I tried '> merged_seqtabnochim <- mergeSequenceTables(seqtab1, seqtab2, orderBy = "abundance")', but this returned a table with 80 samples, whereas I expect 40. If I remove '1062-1' and '1062-2', then the problem would be the complaint about duplicates.

Could someone kindly let me know how I should process the data? If I merge the ASV tables of the two runs (reps), then, for instance, the count for 'ASVxxx' must be divided by two since I merged two ASV files. I apologize; I am a bit confused. P:S. I am using DADA1 and phyloseq only." Thank you so much Cheers, Abdy

benjjneb commented 8 months ago

Keeping the replicates and using them in your downstream analysis is one option.

If you want to aggregate them by making sums of ASV counts across the two replicates for each sample, you could use the rowsum command on the combined table you get after mergeSequenceTables. You'll need to define the group vector that specifies the sample for all 80 rows first.