benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
468 stars 142 forks source link

Run DADA2 for multiple studies #1652

Closed ghost closed 4 months ago

ghost commented 1 year ago

Hi,

I have output from different studies of 16S amplicon-sequenced reads with different sample sizes. They are from either V4 or V3-V4 regions but I want to analyze them together. I have run DADA2 separately because they need to be treamed differently. My question is:

  1. how can I analyze them together?
  2. can I run once dada2 with all datasets?
benjjneb commented 1 year ago

You should run each of them through dada2 individually, given that different methodologies were used in each study. Then you will want to combine the sequence tables from each study into a single merged sequence table. This is complicated by the variation in the sequenced region though, and so in order to merge studies at the ASV level, you will have to trim all studies to a common region of the 16S gene. See an example on how to do this from our recent meta-analysis of the vaginal microbiome and preterm birth:

To obtain comparable ASVs among datasets, we divided the datasets into two groups based on the region of the 16S gene that was sequenced (V1-V2 and V4) with five datasets in the V1-V2 group and seven datasets in the V4 group. Then we truncated the original ASVs separately for each group to a common V1-V2 or V4 region in three steps: (1) align the original ASVs to the SILVA reference database using the mothur software (Schloss et al.,2009); (2) identify the overlapping sequencing region common to all ASVs in the group using an alignment visualization tool (MSAviewer); (3) truncate the original ASVs and remove alignment gaps using the extractalign and degapseq commands.

AnaMariaCabello commented 8 months ago

Hi, I have a question about how to deal with only one sample in dada2. For the same experiment I have 2 sequencing runs (of 95 and 22 samples respectively); the sequencing of one of the 95 samples from the first sequencing run failed. The sequencing company re-sequenced that sample, but now I have to deal with this single sample as a different run. Do you think is a major issue if I run dada2 in only one sample**? or would you run this sample together with the other 94 even-though they belong to different runs?. I read the dada2 is not optimal for single samples and other tools can be used instead (Deblur, UNOISE), but if is not a major issue I would prefer to use dada2 (I'm dealing with 18S sequencing to evaluate microbial diversity in a time-series, and this sample is one more spot, nothing relevant)

** as recommended, for different sequencing runs, I run dada2 separately and then I merged the sequences tables of each run before chimera removal.

Thanks a lot, Ana

benjjneb commented 8 months ago

or would you run this sample together with the other 94 even-though they belong to different runs?

I think this is probably OK for a single sample. Just keep in back of mind that it was from a different run later if it stand out as an outlier in subsequent analysis.