Closed montoyah closed 8 years ago
Can you clarify where you are getting an error?
Are you getting an error when running assignTaxonomy with the greenGenes reference? Or is the error just cropping up later after the tax_table is merged into a phyloseq object?
I get the error while making the "top.20" plot. This is the matrix I get after assigning taxa:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "kBacteria" "pBacteroidetes" "cBacteroidia" "oBacteroidales" "fS24-7" "g" "s"
[2,] "kBacteria" "pBacteroidetes" "cBacteroidia" "oBacteroidales" "fS24-7" "g" "s"
[3,] "kBacteria" "pBacteroidetes" "cBacteroidia" "oBacteroidales" "fS24-7" "g" "s__"
I know I can do partial matching and that would solve the problem, I just want to make sure it's not a problem with the file, which I already donwloaded again to see if that would fix it but it didn't.
Is there an NA in the family column in the top 20 taxa in your data?
It doesn't seem like there is any (and this is the tutorial's data set; everything):
ps.top20 phyloseq-class experiment-level object otu_table() OTU Table: [ 20 taxa and 19 samples ] sample_data() Sample Data: [ 19 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 20 taxa by 7 taxonomic ranks ]
I see, you are following the tutorial but switching in the GG reference.
In the tutorial, immediately after running assignTaxonomy(...) there is a command to name the columns of the taxa matrix by the phylogenetic rank. Change that to:
colnames(taxa) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
The command in the toturial (same, except no "Species") fails for the GG reference, because GG goes down to species level rather than just Genus level.
That made it! Thanks a lot, and thanks for the hard work on developing this pipeline.
Hi,
I can assignTaxonomy to the DADA2 tutorial's data set by using RDP and Silva's training files, but when I use the GreenGenes one (gg_13_8_train_set_97.fa.gz), the phyloseq taxa abundance plot yields the error "family"...not found. Any idea what's going on?
Here's the session info (Ubunutu 14.04):
locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
[6] LC_MESSAGES=en_CA.UTF-8 LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] ggplot2_2.1.0 ShortRead_1.28.0 GenomicAlignments_1.6.3 SummarizedExperiment_1.0.2 Biobase_2.30.0
[6] Rsamtools_1.22.0 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3 Biostrings_2.38.4 XVector_0.10.0
[11] IRanges_2.4.8 S4Vectors_0.8.11 BiocParallel_1.4.3 BiocGenerics_0.16.1 phyloseq_1.14.0
[16] dada2_0.99.10 Rcpp_0.12.4 devtools_1.11.1 BiocInstaller_1.20.1
loaded via a namespace (and not attached): [1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 iterators_1.0.8 bitops_1.0-6 futile.options_1.0.0 tools_3.2.5
[8] zlibbioc_1.16.0 digest_0.6.9 nlme_3.1-127 memoise_1.0.0 gtable_0.2.0 lattice_0.20-33 mgcv_1.8-12
[15] igraph_1.0.1 Matrix_1.2-5 foreach_1.4.3 cluster_2.0.4 withr_1.0.1 hwriter_1.3.2 stringr_1.0.0
[22] multtest_2.26.0 ade4_1.7-4 grid_3.2.5 data.table_1.9.6 survival_2.39-2 RJSONIO_1.3-0 latticeExtra_0.6-28 [29] reshape2_1.4.1 lambda.r_1.1.7 magrittr_1.5 MASS_7.3-45 splines_3.2.5 codetools_0.2-14 scales_0.4.0
[36] permute_0.9-0 ape_3.4 colorspace_1.2-6 stringi_1.0-1 munsell_0.4.3 biom_0.3.12 vegan_2.3-5
[43] chron_2.3-47
Thanks in advance for any insights,
Oscar.