Open lynneitelson opened 2 years ago
It's probably easiest to not write things out to csv files, but to either save them as RDS files (saveRDS
, readRDS
) that maintain the full structure of the R object, or to just do both analyses in a single R session. For example, the dada2 tutorial ends with a section showing how to move the data into the phyloseq package, which works well with decontam
.
That said, if you need to use .csv
files as an intermediate, then you need to pay close attention to the options of the read.table
or read.csv
functions when you import the data, and then modify the imported data as needed to match the correct data format.
IN short, for the meta data you want a data.frame with rownames
equal to the sample names. And you want the ASV table to have rows corresponding to samples, columns corresponding to ASVs, and be in matrix format. That is the data type that you will have at the end of the DADA2 tutorial workflow, but it won't be preserved in exactly that format when saving/reading from a csv file unless one is careful with the flags in those export/import functions.
Thanks!
I used the phyloseq object from the dada2. However when i attempt to continue the code i get the following error:
`` ps <- phyloseq(otu_table(seqtab, taxa_are_rows=FALSE), tax_table(taxa1)) ps <- prune_samples(sample_names(ps) != "Mock", ps) # Remove mock sample
dna <- Biostrings::DNAStringSet(taxa_names(ps)) names(dna) <- taxa_names(ps) ps <- merge_phyloseq(ps, dna) taxa_names(ps) <- paste0("ASV", seq(ntaxa(ps))) ps
head(sample_data(ps)) df <- as.data.frame(sample_data(ps)) "
Output:
Thank you,
Lynne
The phyloseq object you have created does not contain sample metadata, hence the sample_data(ps)
command cannot return anything and you get this error.
Some guidance on that on the phyloseq site: https://joey711.github.io/phyloseq/merge.html#merge_phyloseq
In short, you'll need to read in your sample metadata into a data.frame with rows that are samples and columns that are different sample variables. This is often stored in the form of a spread sheet or a tsv/csv file. If in a spreadsheet, exports it as a csv and it can be read into a data.frame in R with read.csv
.
Thanks!
sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample" contamdf.prev <- isContaminant(ps, method="prevalence", neg="is.neg") table(contamdf.prev$contaminant)
How do I implement the positive control (mock community) in the package?
Lynne
There is no method in decontam that uses positive controls to identify contaminants. You can read about the two approaches for contaminant identification (frequency
and prevalence
) implemented by decontam in the paper introducing the method: https://doi.org/10.1186/s40168-018-0605-2
Thank you! `` ps2 <- phyloseq(sample_data(metadata_decontem), otu_table(seqtab, taxa_are_rows=FALSE), tax_table(taxa1))
head(sample_data(ps))
df_decontem <- as.data.frame(sample_data(ps2)) # Put sample_data into a ggplot-friendly data.frame df_decontem$LibrarySize <- sample_sums(ps2) df_decontem <- df_decontem[order(df_decontem$LibrarySize),] df_decontem$Index <- seq(nrow(df_decontem)) df_decontem$Sample_or_control <- metadata_decontem$Sample_or_Control
ggplot(data = df_decontem, aes(x=Index, y=LibrarySize, color = Sample_or_control)) + geom_point()
sample_data(ps2)$is.neg <- sample_data(ps2)$Sample_or_Control == "Control Sample" contamdf.prev <- isContaminant(ps2, method="prevalence", neg="is.neg") table(contamdf.prev$contaminant) head(which(contamdf.prev$contaminant)) `` I got the following outputs, what might of gone wrong along the way?
Thank you,
Lynne
The table also had a lot of NA's in it
Hi!
I'm pretty new to all of this. I have finished the dada2 amplicon sequencing output and have the asv ouput. I added the code of the input, is the input of the original data correct? The original file has four samples + Negative and positive control.
Thank you!
decontem_data.csv
`` library(phyloseq) library(ggplot2) install.packages("decontem") library(decontam)
setwd("D:/MichaI20/Desktop/meta/bbduk/decontem") decontem_data <- read.csv("decontem_data.csv", header = TRUE) decontem_data[1:3, 1:3] ASV X1 X2 1 1 19677 6766 2 2 29402 8794 3 3 23870 7339
decontam_matrix <- data.matrix(decontem_data, rownames.force = NA) decontam_matrix[1:3, 1:3] X X1 X2 [1,] 2770 19677 29402 [2,] 1672 6766 8794 [3,] 2640 675 1232 decontem_metadata.csv decontem_data <- read.csv("decontem_metadata.csv", header = TRUE) decontem_data[1:2, 1:2] one true_sample 1 two true_sample 2 three true_sample decontam_matrix <- data.matrix(decontem_data, rownames.force = NA) decontam_matrix[1:2, 1:2] one true_sample [1,] 4 2 [2,] 3 2 "