Closed YiweiNiu closed 5 years ago
Hi. I can take a look after my holidays ends after next week.
Regards / MVH Kasper Skytte Andersen
On 5 Jul 2019, at 10.41, Yiwei Niu notifications@github.com wrote:
Hi,
I tried to load phyloseq object.
library(ampvis2) otutable <- data.frame(OTU = rownames(t(otu_table(ps)@.Data)), t(otu_table(ps)@.Data), tax_table(ps)@.Data, check.names = FALSE )
Extract metadata from the phyloseq object:
metadata <- data.frame(sample_data(ps), check.names = FALSE )
Load the data with amp_load:
ps.av2 <- amp_load(otutable, metadata, tree = phy_tree(ps)) But got the following error.
Error in
.rowNamesDF<-
(x, value = value) : duplicate 'row.names' are not allowed
- stop("duplicate 'row.names' are not allowed")
.rowNamesDF<-
(x, value = value)row.names<-.data.frame
(*tmp*
, value = value)row.names<-
(*tmp*
, value = value)rownames<-
(*tmp*
, value = as.character(metadata[, 1]))- amp_load(otutable, metadata, tree = phy_tree(ps)) I did not see any duplicates in the row names of otutable or metadata.
dim(otutable) [1] 522 13 dim(metadata) [1] 5 4 head(otutable) OTU S6-1-1 S6-1-2 S6-1-3 S6-10-1 S6-5-2 Kingdom Phylum Class Order ASV1 ASV1 1033 1910 657 101 47 Bacteria Firmicutes Clostridia Clostridiales ASV2 ASV2 759 964 369 62 30 Bacteria Firmicutes Clostridia Clostridiales ASV3 ASV3 535 932 291 54 19 Bacteria Firmicutes Clostridia Clostridiales ASV4 ASV4 406 1027 294 33 25 Bacteria Firmicutes Clostridia Clostridiales ASV5 ASV5 400 494 230 25 8 Bacteria Firmicutes Clostridia Clostridiales ASV6 ASV6 279 594 162 24 29 Bacteria Firmicutes Clostridia Clostridiales Family Genus Species ASV1 Ruminococcaceae
ASV2 Ruminococcaceae ASV3 Ruminococcaceae ASV4 Lachnospiraceae Lachnospira ASV5 Ruminococcaceae Faecalibacterium ASV6 Lachnospiraceae Lachnospira pectinoschiza length(rownames(otutable)) [1] 522 length(unique(rownames(otutable))) [1] 522 rownames(metadata) [1] "S6-1-1" "S6-1-2" "S6-1-3" "S6-10-1" "S6-5-2" Here is the R session information. sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] ampvis2_2.4.11 ggplot2_3.2.0 phyloseq_1.28.0
[4] DESeq2_1.24.0 SummarizedExperiment_1.14.0 DelayedArray_0.10.0
[7] BiocParallel_1.17.18 matrixStats_0.54.0 Biobase_2.44.0
[10] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0 Rcpp_1.0.1
[13] Biostrings_2.52.0 XVector_0.24.0 IRanges_2.18.1
[16] S4Vectors_0.22.0 BiocGenerics_0.30.0loaded via a namespace (and not attached): [1] nlme_3.1-140 bitops_1.0-6 lubridate_1.7.4 bit64_0.9-7
[5] httr_1.4.0 doParallel_1.0.14 RColorBrewer_1.1-2 tools_3.6.0
[9] backports_1.1.4 vegan_2.5-5 R6_2.4.0 rpart_4.1-15
[13] mgcv_1.8-28 Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.2
[17] colorspace_1.4-1 permute_0.9-5 ade4_1.7-13 nnet_7.3-12
[21] withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3 bit_1.1-14
[25] compiler_3.6.0 cli_1.1.0 htmlTable_1.13.1 network_1.15
[29] plotly_4.9.0 scales_1.0.0 checkmate_1.9.3 genefilter_1.66.0
[33] ggnet_0.1.0 stringr_1.4.0 digest_0.6.19 Rsamtools_2.0.0
[37] foreign_0.8-71 base64enc_0.1-3 pkgconfig_2.0.2 htmltools_0.3.6
[41] htmlwidgets_1.3 rlang_0.4.0 rstudioapi_0.10 RSQLite_2.1.1
[45] hwriter_1.3.2 jsonlite_1.6 acepack_1.4.1 dplyr_0.8.2
[49] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.1 Formula_1.2-3
[53] biomformat_1.12.0 Matrix_1.2-17 munsell_0.5.0 Rhdf5lib_1.6.0
[57] ape_5.3 stringi_1.4.3 MASS_7.3-51.4 zlibbioc_1.30.0
[61] rhdf5_2.28.0 plyr_1.8.4 grid_3.6.0 blob_1.1.1
[65] ggrepel_0.8.1 crayon_1.3.4 lattice_0.20-38 cowplot_0.9.4
[69] splines_3.6.0 multtest_2.40.0 annotate_1.62.0 locfit_1.5-9.1
[73] knitr_1.23 pillar_1.4.2 igraph_1.2.4.1 geneplotter_1.62.0
[77] reshape2_1.4.3 codetools_0.2-16 XML_3.98-1.20 glue_1.3.1
[81] latticeExtra_0.6-28 remotes_2.1.0 data.table_1.12.2 RcppParallel_4.4.3
[85] foreach_1.4.4 tidyr_0.8.3 gtable_0.3.0 purrr_0.3.2
[89] assertthat_0.2.1 xfun_0.8 xtable_1.8-4 viridisLite_0.3.0
[93] survival_2.44-1.1 tibble_2.1.3 iterators_1.0.10 GenomicAlignments_1.20.1 [97] AnnotationDbi_1.46.0 memoise_1.1.0 cluster_2.1.0 Do you have any ideas about this?Bests, Yiwei Niu
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi again. You can check for duplicates with any(duplicates(rownames(otutable)))
, and see which are duplicates with which(duplicated(rownames(otutable)))
. There are duplicates somewhere, or else you wouldn't get that error. Maybe there are NA's or similar?
I'm also having this issue- checking with any(duplicates(rownames(otutable)))
and which(duplicated(rownames(otutable)))
results in integer(0)
and [1] FALSE
, respectively.
The below is a kludge adapted from this gist that works for DADA2 clustered phyloseq objects https://gist.github.com/shandley/87b0017ca593c58561a8b76b98985477 and appears to solve this issue, at least for me.
otu.table <- t(otu_table(ps, taxa_are_rows = FALSE))
otu.table <- as.data.frame(otu.table)
otu.table <- tibble::rownames_to_column(otu.table, var = "OTU")
head(rownames(otu.table))
head(colnames(otu.table))
nrow(otu.table)
tax.table <- as.data.frame(tax_table(ps))
tax.table <- tibble::rownames_to_column(tax.table, var = "OTU")
head(rownames(tax.table))
head(colnames(tax.table))
nrow(tax.table)
my_otu_table <- left_join(otu.table, tax.table, by = "OTU")
nrow(my_otu_table)
colnames(my_otu_table)
head(rownames(my_otu_table))
my_otu_table <- tibble::column_to_rownames(my_otu_table, var = "OTU")
head(rownames(my_otu_table))
colnames(my_otu_table)
nrow(my_otu_table)
my_metadata <- data.frame(phyloseq::sample_data(ps),
check.names = FALSE
)
my_metadata <- tibble::rownames_to_column(my_metadata, "SampleID")
AV.Object <- amp_load(otutable = my_otu_table, metadata = my_metadata, tree = fitGTR$tree)
AV.Object
Thanks. What does taxa_are_rows(ps)
say? I think you just have to skip transposing
Can you test if this function works on your data directly? I haven't been able to find example data where taxa_are_rows = FALSE
. Or do you mind if I play around with your data to test it?
#install ampvis2 with devtools::install_github("madsalbertsen/ampvis2")
#install phyloseq with BiocManager::install("phyloseq)
data("GlobalPatterns", package = "phyloseq")
#function to convert a phyloseq object to an ampvis2 object
phyloseq_to_ampvis2 <- function(physeq) {
#check object for class
if(!any(class(physeq) %in% "phyloseq"))
stop("physeq object must be an object of class \"phyloseq\"", call. = FALSE)
#ampvis2 requires taxonomy and abundance table, phyloseq checks for the latter
if(is.null(physeq@tax_table))
stop("No taxonomy found in the phyloseq object", call. = FALSE)
#OTUs must be in rows, not columns
if(phyloseq::taxa_are_rows(physeq))
abund <- as.data.frame(phyloseq::otu_table(physeq)@.Data)
else
abund <- as.data.frame(t(phyloseq::otu_table(physeq)@.Data))
#tax_table is assumed to have OTUs in rows too
tax <- phyloseq::tax_table(physeq)@.Data
#merge by rownames (OTUs)
otutable <- merge(
abund,
tax,
by = 0,
all.x = TRUE,
all.y = FALSE,
sort = FALSE
)
colnames(otutable)[1] <- "OTU"
#extract sample_data (metadata)
if(!is.null(physeq@sam_data)) {
metadata <- data.frame(
phyloseq::sample_data(physeq),
row.names = phyloseq::sample_names(physeq),
stringsAsFactors = FALSE,
check.names = FALSE
)
} else
metadata <- NULL
#extract phylogenetic tree, assumed to be of class "phylo"
if(!is.null(physeq@phy_tree)) {
tree <- phyloseq::phy_tree(physeq)
} else
tree <- NULL
#extract OTU DNA sequences, assumed to be of class "XStringSet"
if(!is.null(physeq@refseq)) {
#convert XStringSet to DNAbin using a temporary file (easiest)
fastaTempFile <- tempfile(pattern = "ampvis2_", fileext = ".fa")
Biostrings::writeXStringSet(physeq@refseq, filepath = fastaTempFile)
} else
fastaTempFile <- NULL
#load as normally with amp_load
ampvis2::amp_load(
otutable = otutable,
metadata = metadata,
tree = tree,
fasta = fastaTempFile
)
}
d <- phyloseq_to_ampvis2(GlobalPatterns)
d
#> ampvis2 object with 4 elements.
#> Summary of OTU table:
#> Samples OTUs Total#Reads Min#Reads Max#Reads
#> 26 19216 28216678 58688 2357181
#> Median#Reads Avg#Reads
#> 1106849 1085256.85
#>
#> Assigned taxonomy:
#> Kingdom Phylum Class Order Family
#> 19216(100%) 18974(98.74%) 18887(98.29%) 17460(90.86%) 13599(70.77%)
#> Genus Species
#> 7968(41.47%) 1413(7.35%)
#>
#> Metadata variables: 7
#> X_SampleID, Primer, Final_Barcode, Barcode_truncated_plus_T, Barcode_full_length, SampleType, Description
Created on 2019-08-21 by the reprex package (v0.3.0)
taxa_are_rows(ps)
does return as FALSE, and I can't get the above example to successfully run on my data (Which I cannot share right now, my apologies)- it produces the same error as before of duplicate 'row.names' are not allowed
. I'll try and locate some sharable data that flows through the DADA2 workflow that also produces this error...
I see I made a mistake in the phyloseq_to_ampvis2()
function, which is corrected now. It used GlobalPatterns
inside the function instead of the physeq
data provided by the user. Try again.
I just tried with some samples from a 2018 project, ran it through the dada2 tutorial and it loads fine.
@YiweiNiu Do you mind sharing your data so I can debug?
I can try this at work tomorrow, will let you know ASAP. My apologies for delays.
Absolutely no rush! I'm most likely busy next week anyways. Thanks for contributing
Hi,
Sorry for the late reply.
The code, R, R packages are still the same as above. And still got the same error.
> any(duplicated(rownames(otutable)))
[1] FALSE
> which(duplicated(rownames(otutable)))
integer(0)
> taxa_are_rows(ps)
[1] FALSE
I would like to share my data. Here is the phyloseq object I used niuyw_ps.zip.
Bests, Yiwei Niu
Thank you @YiweiNiu. Good with some data. I now see that the problem is that sample ID's are rownames in the phyloseq metadata, but ampvis2 expects them to be in the first column. I'll make a solution to this soon.
Glad to know that. Very looking forward to the new version.
Here is a revised function. Works with your data @YiweiNiu. @bstamps can you confirm if it works with your data? I'll likely store the function as a gist and never include it in ampvis2 to avoid phyloseq as a dependency, ampvis2 is meant to be independent of any bioconductor packages.
phyloseq_to_ampvis2 <- function(physeq) {
#check object for class
if(!any(class(physeq) %in% "phyloseq"))
stop("physeq object must be of class \"phyloseq\"", call. = FALSE)
#ampvis2 requires taxonomy and abundance table, phyloseq checks for the latter
if(is.null(physeq@tax_table))
stop("No taxonomy found in the phyloseq object and is required for ampvis2", call. = FALSE)
#OTUs must be in rows, not columns
if(phyloseq::taxa_are_rows(physeq))
abund <- as.data.frame(phyloseq::otu_table(physeq)@.Data)
else
abund <- as.data.frame(t(phyloseq::otu_table(physeq)@.Data))
#tax_table is assumed to have OTUs in rows too
tax <- phyloseq::tax_table(physeq)@.Data
#merge by rownames (OTUs)
otutable <- merge(
abund,
tax,
by = 0,
all.x = TRUE,
all.y = FALSE,
sort = FALSE
)
colnames(otutable)[1] <- "OTU"
#extract sample_data (metadata)
if(!is.null(physeq@sam_data)) {
metadata <- data.frame(
phyloseq::sample_data(physeq),
row.names = phyloseq::sample_names(physeq),
stringsAsFactors = FALSE,
check.names = FALSE
)
#check if any columns match exactly with rownames
#if none matched assume row names are sample identifiers
samplesCol <- unlist(lapply(metadata, function(x) {
identical(x, rownames(metadata))}))
if(any(samplesCol)) {
#error if a column matched and it's not the first
if(!samplesCol[[1]])
stop("Sample ID's must be in the first column in the sample metadata, please reorder", call. = FALSE)
} else {
#assume rownames are sample identifiers, merge at the end with name "SampleID"
if(any(colnames(metadata) %in% "SampleID"))
stop("A column in the sample metadata is already named \"SampleID\" but does not seem to contain sample ID's", call. = FALSE)
metadata$SampleID <- rownames(metadata)
#reorder columns so SampleID is the first
metadata <- metadata[, c(which(colnames(metadata) %in% "SampleID"), 1:(ncol(metadata)-1L)), drop = FALSE]
}
} else
metadata <- NULL
#extract phylogenetic tree, assumed to be of class "phylo"
if(!is.null(physeq@phy_tree)) {
tree <- phyloseq::phy_tree(physeq)
} else
tree <- NULL
#extract OTU DNA sequences, assumed to be of class "XStringSet"
if(!is.null(physeq@refseq)) {
#convert XStringSet to DNAbin using a temporary file (easiest)
fastaTempFile <- tempfile(pattern = "ampvis2_", fileext = ".fa")
Biostrings::writeXStringSet(physeq@refseq, filepath = fastaTempFile)
} else
fastaTempFile <- NULL
#load as normally with amp_load
ampvis2::amp_load(
otutable = otutable,
metadata = metadata,
tree = tree,
fasta = fastaTempFile
)
}
Can confirm this worked with my dataset. Thank you for working through this. I know you have your own data model and don't want phyloseq as a dependency of Ampvis2, but the gist will certainly be helpful for those of us that move between the two!
Great, glad it worked. I'll link it in the FAQ on the website.
Hello everyone,
I hope I can reopen this issue again. I have the same problem using my phyloseq data in the amp_load function. I used the provided function developed by @KasperSkytte and applied it to my phyloseq object. So far, I couldn't solve the Error message that I got when I applied the function:
phyloseq_to_ampvis2(seq_MR) Error in colSums(abund) : 'x' must be numeric 3. colSums(abund) 2. ampvis2::amp_load(otutable = otutable, metadata = metadata, tree = tree, fasta = fastaTempFile) 1. phyloseq_to_ampvis2(phyloseq)
My phyloseq object "seq_MR" is rarefied and subsetted. My seq_MR OTU table is numeric:
> is.numeric(seq_MR@otu_table) [1] TRUE
Of course, in the function process, the tax_table data is added to the otu_table, but that appeared to be no issue in the previous experiences shared here.
If you would like me to share my data, let me know! I would be very happy for some help & suggestions - Thank you! Marcellina
Hi @marcellinaR
Can you provide a reproducible example please?
Hi thank you for your quick reply!
I am using R Studio (Version 2023.06.2+561) on my MacOS Ventura 13.3.1. This is my phyloseq object: seq_MR.rds.zip
I copied your function and applied this on my phyloseq object "seq_MR", as shown above. The function worked well on the provided data by YiweiNiu, but I received the already shared Error working with my data set:
> dM <- phyloseq_to_ampvis2(seq_MR)
Error in colSums(abund) : 'x' must be numeric
3.
colSums(abund)
2.
ampvis2::amp_load(otutable = otutable, metadata = metadata, tree = tree,
fasta = fastaTempFile)
1.
phyloseq_to_ampvis2(seq_MR)
The difference that I noticed comparing my otu and tax_table to YiweiNiu's is that her columnnames contain sequences. I have OTUs as rows and sampleIDs/taxonomy as columns:
> dim(seq_MR@otu_table)
[1] 1400 84
> seq_MR@otu_table
OTU Table: [1400 taxa and 84 samples]
taxa are rows
MW007.fastq_silva MW008.fastq_silva MW009.fastq_silva MW011.fastq_silva MW013.fastq_silva MW016.fastq_silva MW017.fastq_silva
3 34 33 34 84 35 29 71
43656 0 0 6 0 0 0 0
43534 3 2 1 0 0 0 1
43481 0 0 0 0 0 0 0
46449 30 20 26 76 93 24 23
50242 8 9 12 11 5 2 6
50471 3345 1380 1136 46 13 621 3839
50381 17268 18173 18558 19940 22360 22335 15875
46260 1 0 0 1 0 0 2
> seq_MR@tax_table
Taxonomy Table: [1400 taxa by 9 taxonomic ranks]:
Kingdom Phylum Class Order
3 "Bacteria" "unclassified" "unclassified" "unclassified"
43656 "Bacteria" "Actinobacteriota" "Actinobacteria" "Propionibacteriales"
43534 "Bacteria" "Actinobacteriota" "Actinobacteria" "Micrococcales"
43481 "Bacteria" "Actinobacteriota" "Actinobacteria" "Micrococcales"
46449 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
50242 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
50471 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Pseudomonadales"
Thank you!
The issue is the format of the taxonomy. It's weird. I'll make some checks and update. 2 min
Since this issue and the gist mentioned in the FAQ amp_load
has matured quite a bit and now include various checks for the formats of the individual objects provided. So I decided to do this properly now. As mentioned earlier the phyloseq package will not be a required package by ampvis2, but by just returning an error instead if it's not installed it's now possible to provide a phyloseq
class object directly to amp_load
. It's as simple as: amp_load(phyloseq_object)
. Update to the recent version 2.8.4 for it to be available.
@marcellinaR In your case you need to ensure a "OTU" column is present in the taxonomy. It's required.
Alright, thank you I had version ‘2.8.3’. I now have packageVersion ‘2.8.4’.
I have tried to modify my taxonomy table and my OTU table by adding an OTU column before and reading everything in the amp_load()
function as data frames. I received the duplicate row.names error.
I now tried to change my OTU and taxonomy table and created a new phyloseq object to apply the amp_load()
function. But phyloseq does not take an OTU table containing non-numeric values.
otu_tab <- otu_table(seq_MR)
otu_tab <- as.data.frame(otu_tab)
otu_tab$OTU <- rownames(otu_tab)
otu_tab <- as.matrix(otu_tab)
otu_table = otu_table(otu_tab, taxa_are_rows = TRUE, errorIfNULL=TRUE)
Error in validObject(.Object) : invalid class “otu_table” object:
Non-numeric matrix provided as OTU table.
Abundance is expected to be numeric.
However, I can add an OTU column only to my tax_table and keep the OTU table and sample_data unchanged. Applying the amp_load()
function here, I only receive the warning "Could not find a column named OTU/ASV in otutable, using rownames as OTU ID's "
> # prepare data with additional "OTU" and "SampleID" columns as required for amp_load()
> otu_tab <- otu_table(seq_MR)
> meta_tab <- sample_data(seq_MR)
> tax_tab <- tax_table(seq_MR)
> tax_tab <- as.data.frame(tax_tab)
> tax_tab$OTU <- rownames(tax_tab)
> tax_tab <- as.matrix(tax_tab)
> otu_table = otu_table(otu_tab, taxa_are_rows = TRUE, errorIfNULL=TRUE)
> sam_data = sample_data(meta_tab, errorIfNULL = TRUE)
> tax_table = tax_table(tax_tab, errorIfNULL=TRUE)
> # make phyloseq
> seq_MR_amp = phyloseq(sam_data,otu_table,tax_table)
> # load in ampvis2 object
> amp_test <- amp_load(seq_MR_amp)
Phyloseq object provided. Ignoring anything provided for the metadata, taxonomy, fasta, or tree arguments, using those from the phyloseq object instead.
Warning message:
Could not find a column named OTU/ASV in otutable, using rownames as OTU ID's
The same appears if I simply use my entire phyloseq object.
> amp_MR <- amp_load(seq_MR)
Phyloseq object provided. Ignoring anything provided for the metadata, taxonomy, fasta, or tree arguments, using those from the phyloseq object instead.
Warning messages:
1: Could not find a column named OTU/ASV in otutable, using rownames as OTU ID's
2: Could not find a column named OTU/ASV in taxonomy, using rownames as OTU ID's
If I look at the structure of my ampvis2 object ("amp_MR"), everything looks alright I would say. Therefore, I might have a super simple and stupid question now, but how can I have a look at my e.g. OTU table as I would do with my phyloseq object using seq_MR@otu_table
or: head(seq_MR@otu_table)
? To check if OTU IDs etc. are present and if the object is good to use for analysis? This is my ampvis2 object:
> amp_MR
ampvis2 object with 3 elements.
Summary of OTU table:
Samples OTUs Total#Reads Min#Reads Max#Reads Median#Reads Avg#Reads
84 1400 2117556 25209 25209 25209 25209
Assigned taxonomy:
Kingdom Phylum Class Order Family Genus Species
1400(100%) 1398(99.86%) 1400(100%) 1398(99.86%) 1395(99.64%) 1398(99.86%) 1400(100%)
Metadata variables: 34
SampleID, approach, kingdom, phylum, class, order, family, genus, species, taxon, order_tax, NMS_time, NMS_time_txt, variant, mother_colony, collection_date, depth_small, depth_group, depth_m, depth_fathoms, lat, long, location, geo_group, extraction_date, control_ID, qubit_br, qubit_hs, nanodrop, A_260_280, A_260_230, fixative, loc_time, time_tac
Thanks again for your help!!
It looks like you are loading things through phyloseq first, then import with amp_load
. Why not just skip phyloseq? Looks over-complicated to me. The taxonomy you provided here is a mess, I would go upstream and fix things. You must have some proper OTU/ASV names from the processing workflow. You also have "unclassified" fields everywhere. But in the end a warning is just a warning. You can continue with analysis as-is. An ampvis2 object is just a list, use View()
or similar.
So far, my analysis worked well using my data in a phyloseq format, or alternatively usingpsmelt()
prior to analysis. But I extracted my data into data frames now and adjusted the structure - as expected, no warnings anymore.
The OTU& taxonomic table was provided to me in a biom format for my master thesis, which I then used for further processing and analysis. Sequencing was conducted using the MinION. Kranken2.1.2 in conjunction with the SILVA reference database (138.1 release) was used for taxonomic assignment (following: DOI: 10.1111/mec.16462). Further processing, rarefaction etc. of the data was done with phyloseq.
I know this is a bit off-topic now, but if you have the additional minutes, I would be very thankful to hear your suggestions to solve the "mess" in my current data and also to improve my analysis in the future. Which part of the upstream analysis should have been done differently based on the information I provided?
Anyways, thank you for your help, I am happy to explore ampvis2 now.
You can also load biom format data directly into ampvis2. With mess I mean the column names are not all taxonomic levels (Phy_Fam and OTU_tax, should only contain Kingdom -> Species+OTU) and if you have "unclassified" entries everywhere they are going to be considered a single taxon, even though they are not. R doesn't know about bacterial names, so it's up to you to do a sanity check first. You'll otherwise get inflated abundances with a mysterious bacterium called "unclassified", when they are in fact from many different unrelated species. But I assume you have been using SILVA. It's notoriously bad with regards to weird names and placeholders in the actual taxonomy, so you should be removing those first. There are others, you can gsub
all columns with a pattern like "unclassified|uncultured|unknown|unidentified|incertae sedis|metagenome|\\bbacterium\\b|\\bpossible\\b"
. If they are environmental samples from sludge or digesters I highly recommend the MiDAS database, which I have co-authored. With nanopore reads we almost always get species classifications for 99%+ with just a raw minimap2 mapping. Script here for inspiration https://github.com/cmc-aau/fl16S_nanopore/.
Alright! Thank you for your help and your suggestions, I will apply that to clean my data. Thanks also, for providing the script and recommending MiDAS as database - I will look into that!
Hi,
I tried to load phyloseq object.
But got the following error.
I did not see any duplicates in the row names of
otutable
ormetadata
.Here is the R session information.
Do you have any ideas about this?
Bests, Yiwei Niu