Unmatch raw reads that exporting from DADA2 to importing to Ampvis2

phxu452 commented 3 years ago

Hi. I did dada2 to export asv table and taxonomy, sequence files. Then I combine the asv table and taxonomy table, and import these tables into ampvis2 to do further analysis. But the numbers of reads in the same sample file are not matching.

For example, in dada2, after final step of nonchim filtering, the raw reads in one sample is 5145. But in summary or amp_alphdiv commond, the raw reads of the same sample is lower, which is 4931. This happen to all the samples, with hundreds to even more than one thousand lower. I wonder is it because the way of ampvis2 counting raw reads is different?

KasperSkytte commented 3 years ago

Hi

Sorry I cannot reproduce without any data. But dada2 is responsible for counting things, ampvis2 simply takes the column sums, it doesn't know anything about the raw data. But the number of reads in the raw data will always be the highest due to filtering, have a look at "Track reads through the pipeline" here https://benjjneb.github.io/dada2/tutorial.html

phxu452 commented 3 years ago

Thanks very much for your reply. Sorry that I mention raw reads. I did track reads and get final reads number after nonchim step, which is 5145. Then I write otu table out and import to ampvis2. The reads of the same sample turns to be 4931. As I use ampvis2 to do rarefication and alpha diversity with rarefy function. It is hard to explain why the reads turns to be lower without filtering but only with counting.

KasperSkytte commented 3 years ago

Sorry I cannot assist if I cannot reproduce. I would need the data to debug, or a subset of it.

phxu452 commented 3 years ago

Thanks for your reply. Is it possible I send you the asv file with asv number and taxnomy so that you could see how to debug?

KasperSkytte commented 3 years ago

yes just upload here or email

phxu452 commented 3 years ago

Just send to your email :)

KasperSkytte commented 3 years ago

I'll reply here for others to benefit. amp_alphadiv is correct. You need to figure out what those DADA2 numbers mean, but ampvis2 gives expected results.

library(data.table)
library(ampvis2)
#> Loading required package: ggplot2
#read table
asvtable <- fread("~/Downloads/ASV_table_check_for_raw_reads/ASV_table_ from Peihang.csv")

#load
d <- amp_load(
  otutable = asvtable
)
#> Warning: No sample metadata provided, creating dummy metadata.

#summary
d
#> ampvis2 object with 3 elements. 
#> Summary of OTU table:
#>      Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads 
#>           16         1652       319838         4931        36585        17344 
#>    Avg#Reads 
#>     19989.88 
#> 
#> Assigned taxonomy:
#>      Kingdom       Phylum        Class        Order       Family        Genus 
#>   1652(100%) 1583(95.82%)  1538(93.1%) 1405(85.05%) 1077(65.19%)  680(41.16%) 
#>      Species 
#>    29(1.76%) 
#> 
#> Metadata variables: 2 
#>  SampleID, DummyVariable

#calculate sample depths with amp_alphadiv
alphadiv <- amp_alphadiv(
  d
)
#> Warning: The data you have provided does not have
#> any singletons. This is highly suspicious. Results of richness
#> estimates (for example) are probably unreliable, or wrong, if you have already
#> trimmed low-abundance taxa from the data.
#> 
#> We recommend that you find the un-trimmed data and retry.

#manual column sums of all samples
samplereads <- colSums(asvtable[,-c("ASV", "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")])
#make sure order is the same
samplereads <- samplereads[alphadiv$SampleID]

#compare
DT <- data.table(
  manual = samplereads,
  amp_alphadiv = alphadiv$Reads
)
DT
#>     manual amp_alphadiv
#>  1:   4931         4931
#>  2:  11984        11984
#>  3:  12818        12818
#>  4:  13790        13790
#>  5:  15114        15114
#>  6:  15971        15971
#>  7:  16513        16513
#>  8:  16782        16782
#>  9:  17906        17906
#> 10:  18886        18886
#> 11:  20267        20267
#> 12:  24958        24958
#> 13:  29286        29286
#> 14:  31990        31990
#> 15:  32057        32057
#> 16:  36585        36585

#identical
identical(DT[,manual], DT[,amp_alphadiv])
#> [1] TRUE

^{Created on 2021-11-17 by the reprex package (v2.0.1)}

phxu452 commented 3 years ago

Thanks very much. I will check for dada2 process then.

KasperSkytte / ampvis2

Unmatch raw reads that exporting from DADA2 to importing to Ampvis2 #127