Closed GingerMATE closed 1 month ago
I'm not sure what is defining the readcount
function, but it is not a phyloseq function.
If you are trying to get the total abundances (numbers of reads) in each sample from a phyloseq object, the appropriate function to use is sample_sums
.
I tried sample_sums to check the reads, but it gave me the same result.
physeq_ps137 <- phyloseq(otu_table(asv_counts_table, taxa_are_rows=TRUE),
tax_table(taxa_with_species), sample_data(sample_metadata))
physeq_ps137
sample_sums(physeq_ps137)
readcount(physeq_ps137)
ArcB1 AuP1 AuB1 AuB2 AuB3
42770 45601 32479 53549 27221
ArcB1 AuP1 AuB1 AuB2 AuB3
42770 45601 32479 53549 27221
The problematic step is between the seqtab.nochim and the physeq object. I assigend my taxa with the tryRC=TRUE argument in the assignTaxonomy function. Do you think that this could have an affect on the number of reads?
Here is the code i used to assign the taxonomy:
unite.ref <- "DNA_results/Dada2/silva_nr99_v138.1_train_set.fa.gz" # CHANGE ME to location on your machine
seqtab.nochim <- readRDS("DNA_results/seqtab_nochim.rds")
taxa <- assignTaxonomy(seqtab.nochim, unite.ref, multithread = parallel::detectCores(), tryRC = TRUE)
taxa <- addSpecies(taxa, "DNA_results/Dada2/silva_species_assignment_v138.1.fa.gz")
taxa.print <- taxa # Removing sequence rownames for display only
rownames(taxa.print) <- NULL
head(taxa.print)
write.table (taxa, file ="DNA_results/Dada2/taxa_PS137_general_def_tryRCTRUE.csv", row.names = T, sep = "\t")
asv_seqs <- rownames(taxa_table)
asv_names <- character(nrow(taxa_table))
for (i in 1:nrow(taxa_table)) {
asv_names[i] <- paste("ASV", i, sep="_")
}
rownames(taxa_table) <- asv_names
write.table(taxa_table, "DNA_results/Dada2/taxa_ps137_tryRCTRUE", sep="\t", quote=F, col.names=T)
I would suggest trying to construct the phyloseq object directly from the original sequence table and tax table from DADA2 (i.e. that contain the full DNA sequences as IDs), and checking if that solves the change in read counts issue.
It is simpler and less error prone to then switch to "ASV1" style IDs in the phyloseq object. Code for this is included in the DADA2 tutorial "Handoff to phyloseq" section:
dna <- Biostrings::DNAStringSet(taxa_names(ps))
names(dna) <- taxa_names(ps)
ps <- merge_phyloseq(ps, dna)
taxa_names(ps) <- paste0("ASV", seq(ntaxa(ps)))
ps
Hello,
im encountering a similar issue to related to my sample reads. After completing the Dada2 pipeline according to the tutorial, i created a phyloseq object, but in doing so i loose reads in almost all my samples. Some of my samples loose over 50%.
Here is an excerpt from my track_reads file with added entry for when i create my phyloseq object:
sample | input | filtered | dadaF | dadaR | merged | nonchim | finalPercReadsKept % | reads physeq obj ArcB1 | 82650 | 47038 | 45576 | 46265 | 43855 | 42770 | 51.7 | 42770 AuB1 | 90559 | 56066 | 54249 | 55233 | 52257 | 50278 | 55.5 | 32479 AuB2 | 100718 | 62615 | 59976 | 60933 | 57438 | 55009 | 54.6 | 53549 AuB3 | 137090 | 81540 | 80356 | 80983 | 79213 | 76150 | 55.5 | 27221
There is a large iscrepancy between the seqtab. nochim reads and the phyloseq object.
This is the code i use to create my seqtab.nochim file in the pipeline:
And here is the code i use to create my phyloseq object:
Any idea what could cause the reads to disappear? Or am i using the readcount function wrong? Thanks in advance!