benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 142 forks source link

Merging dada2 OTU table with taxonomic tree #422

Closed aboyd95 closed 6 years ago

aboyd95 commented 6 years ago

Hi,

I'm trying to create a phyloseq object that contains an OTU table and a phylogenetic tree downloaded from the SILVA database. I've followed the very helpful dada2 tutorial to create the OTU table and taxonomy assignments, but I can't merge the tree and the OTU table since they do not have the same taxa names. Everything that I've read on here suggests that it's normal for the OTU table to have the column names listed as the sequences themselves, since the "taxa" object will link the sequences to the assigned taxonomies. This is the script that I am using.

`path <- "~localpath" list.files(path)

SAMPLENAME_R2_001.fastq fnFs <- sort(list.files(path, pattern="_R1_001.fastq", full.names = TRUE)) fnRs <- sort(list.files(path, pattern="_R2001.fastq", full.names = TRUE)) sample.names <- sapply(strsplit(basename(fnFs), ""), [, 1)

filt_path <- file.path(path, "filtered") # Place filtered files in filtered/ subdirectory filtFs <- file.path(filt_path, paste0(sample.names, "_F_filt.fastq.gz")) filtRs <- file.path(filt_path, paste0(sample.names, "_R_filt.fastq.gz"))

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, trimLeft=10, truncLen=c(280,240), maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE, compress=TRUE, multithread=TRUE) # On Windows set multithread=FALSE head(out)

errF <- learnErrors(filtFs, multithread=TRUE) errR <- learnErrors(filtRs, multithread=TRUE)

derepFs <- derepFastq(filtFs, verbose=TRUE) derepRs <- derepFastq(filtRs, verbose=TRUE) names(derepFs) <- sample.names names(derepRs) <- sample.names

dadaFs <- dada(derepFs, err=errF, multithread=TRUE) dadaRs <- dada(derepRs, err=errR, multithread=TRUE) dadaFs

mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE) head(mergers)

seqtab <- makeSequenceTable(mergers) #Warning: The sequences being tabled vary in length dim(seqtab) table(nchar(getSequences(seqtab)))

seqtab2 <- seqtab[,nchar(colnames(seqtab)) %in% seq(418,448)] table(nchar(getSequences(seqtab2)))

seqtab.nochim <- removeBimeraDenovo(seqtab2, method="consensus", multithread=TRUE, verbose=TRUE) dim(seqtab.nochim) sum(seqtab.nochim)/sum(seqtab2)

set.seed(100) taxa <- assignTaxonomy(seqtab.nochim, "~localpath/silva_nr_v128_train_set.fa", multithread=TRUE) View(taxa)

Add in sample metadata

Metadata <- read.delim("~localpath/Metadata.txt", row.names = 2)

Add in phylogenetic tree

treefile = read_tree("~localpath/LTPs128_SSU_tree.newick.txt", errorIfNULL = T)

ps <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=F), tax_table(taxa), sample_data(Metadata))

physeq<-merge_phyloseq(ps, treefile)`

Using taxa_names(ps) and taxa_names(tree file), I can see that the tree has what appears to be proper taxonomy names, but ps is using the sequences as taxonomy names. I am assuming that I have to alter the OTU table in some way to contain the taxonomy assignments, but I do not know how to go about this. Sorry if the issue here is glaringly obvious, any help is appreciated!

benjjneb commented 6 years ago

Sorry I am a bit confused as to what the problem is.

Is a command failing? Can you specify which one exactly?

aboyd95 commented 6 years ago

I don't believe that any commands are failing, it's just a matter of my lack of knowledge with OTU tables in general. I'm failing to see how to make the OTU table and the phylogenetic tree match so that I can combine them into a single phyloseq object. I've tried renaming the OTU table and the taxonomy table entries to OTU_1, OTU_2, etc instead of the DNA sequences at the advice of a colleague, but that has not helped me to combine the table and the tree yet either.

Is there a specific command that I need to use to do this, or a specific naming convention that will make my tables compatible with SILVA's phylogenetic tree?

Thanks!

benjjneb commented 6 years ago

OK, so making the phyloseq object is failling, because the names don't match.

Is the tree you are importaning the Silva "Living Tree Project" tree?

If so, that can't be simply combined with your sequence data, because your sequences aren't in that tree. That tree only contains a (small) subset of all known 16S sequence variants.

In order to get a tree that contains your sequences, you can either build a de novo tree from the sequences (most common method), or use an algorithm that inserts those sequences into a prebuilt tree like the Silva LTP tree (e.g. pplacer and others). That's beyond the scope of the dada2 R package though, but "phylogenetic placement" would be a good keyword to start looking for such methods.

aboyd95 commented 6 years ago

Thank you! That explains why I couldn't find anything about why the object creation was failing.

Thanks for your help!