Closed phxu452 closed 3 years ago
Hi
Sorry I cannot reproduce without any data. But dada2 is responsible for counting things, ampvis2 simply takes the column sums, it doesn't know anything about the raw data. But the number of reads in the raw data will always be the highest due to filtering, have a look at "Track reads through the pipeline" here https://benjjneb.github.io/dada2/tutorial.html
Thanks very much for your reply. Sorry that I mention raw reads. I did track reads and get final reads number after nonchim step, which is 5145. Then I write otu table out and import to ampvis2. The reads of the same sample turns to be 4931. As I use ampvis2 to do rarefication and alpha diversity with rarefy function. It is hard to explain why the reads turns to be lower without filtering but only with counting.
Sorry I cannot assist if I cannot reproduce. I would need the data to debug, or a subset of it.
Thanks for your reply. Is it possible I send you the asv file with asv number and taxnomy so that you could see how to debug?
yes just upload here or email
Just send to your email :)
I'll reply here for others to benefit. amp_alphadiv
is correct. You need to figure out what those DADA2 numbers mean, but ampvis2
gives expected results.
library(data.table)
library(ampvis2)
#> Loading required package: ggplot2
#read table
asvtable <- fread("~/Downloads/ASV_table_check_for_raw_reads/ASV_table_ from Peihang.csv")
#load
d <- amp_load(
otutable = asvtable
)
#> Warning: No sample metadata provided, creating dummy metadata.
#summary
d
#> ampvis2 object with 3 elements.
#> Summary of OTU table:
#> Samples OTUs Total#Reads Min#Reads Max#Reads Median#Reads
#> 16 1652 319838 4931 36585 17344
#> Avg#Reads
#> 19989.88
#>
#> Assigned taxonomy:
#> Kingdom Phylum Class Order Family Genus
#> 1652(100%) 1583(95.82%) 1538(93.1%) 1405(85.05%) 1077(65.19%) 680(41.16%)
#> Species
#> 29(1.76%)
#>
#> Metadata variables: 2
#> SampleID, DummyVariable
#calculate sample depths with amp_alphadiv
alphadiv <- amp_alphadiv(
d
)
#> Warning: The data you have provided does not have
#> any singletons. This is highly suspicious. Results of richness
#> estimates (for example) are probably unreliable, or wrong, if you have already
#> trimmed low-abundance taxa from the data.
#>
#> We recommend that you find the un-trimmed data and retry.
#manual column sums of all samples
samplereads <- colSums(asvtable[,-c("ASV", "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")])
#make sure order is the same
samplereads <- samplereads[alphadiv$SampleID]
#compare
DT <- data.table(
manual = samplereads,
amp_alphadiv = alphadiv$Reads
)
DT
#> manual amp_alphadiv
#> 1: 4931 4931
#> 2: 11984 11984
#> 3: 12818 12818
#> 4: 13790 13790
#> 5: 15114 15114
#> 6: 15971 15971
#> 7: 16513 16513
#> 8: 16782 16782
#> 9: 17906 17906
#> 10: 18886 18886
#> 11: 20267 20267
#> 12: 24958 24958
#> 13: 29286 29286
#> 14: 31990 31990
#> 15: 32057 32057
#> 16: 36585 36585
#identical
identical(DT[,manual], DT[,amp_alphadiv])
#> [1] TRUE
Created on 2021-11-17 by the reprex package (v2.0.1)
Thanks very much. I will check for dada2 process then.
Hi. I did dada2 to export asv table and taxonomy, sequence files. Then I combine the asv table and taxonomy table, and import these tables into ampvis2 to do further analysis. But the numbers of reads in the same sample file are not matching.
For example, in dada2, after final step of nonchim filtering, the raw reads in one sample is 5145. But in summary or amp_alphdiv commond, the raw reads of the same sample is lower, which is 4931. This happen to all the samples, with hundreds to even more than one thousand lower. I wonder is it because the way of ampvis2 counting raw reads is different?