Closed chymoncada closed 3 months ago
That does seem wrong. Does the example in the documentation work correctly for you?
Looking at your output, it seems that the Date_Fraction
column is being converted to NA
. What type is this column in your phyloseq object?
Could you provide a minimal reproducible example with your issue? This makes it a lot easier for me to inspect what is going on.
Yes, the GlobalPatterns example works for me. Grouping that by SampleType shows me the top n taxa for each sample type.
The Date_Fraction column is a factor, so I think that's fine?
I've emailed you a subset of my phyloseq object since I cant seem to upload .rds files here. That one has 6 different groups for the Date_Fraction parameter. Thanks!
Thanks for checking that. I did not receive an email (and to be honest, I would rather not be emailed data), could you try an alternative (e.g. dput
)?
In the meantime, I've checked with GlobalPatterns if there are any issues with your hypothetical setup. As far as I can tell, the function should be working as intended:
require("fantaxtic")
#> Loading required package: fantaxtic
require("phyloseq")
#> Loading required package: phyloseq
data(GlobalPatterns)
sample_data(GlobalPatterns)$SampleType <- as.factor(sample_data(GlobalPatterns)$SampleType)
top_asv <- top_taxa(GlobalPatterns, n_taxa = 2, grouping = "SampleType", tax_level="Class", by_proportion=F, include_na_taxa = T)
top_asv$top_taxa
#> SampleType tax_rank taxid abundance Kingdom Phylum
#> 1 Freshwater 1 329744 546306.50 Bacteria Actinobacteria
#> 2 Freshwater (creek) 1 549656 1090473.00 Bacteria Cyanobacteria
#> 3 Freshwater 2 279599 522626.50 Bacteria Cyanobacteria
#> 4 Freshwater (creek) 2 360229 112393.67 Bacteria Proteobacteria
#> 5 Tongue 1 360229 381557.50 Bacteria Proteobacteria
#> 6 Sediment (estuary) 1 94166 98640.00 Bacteria Proteobacteria
#> 7 Skin 2 94166 81813.33 Bacteria Proteobacteria
#> 8 Tongue 2 94166 283849.00 Bacteria Proteobacteria
#> 9 Ocean 2 534609 216527.67 Bacteria Proteobacteria
#> 10 Soil 1 534609 127965.33 Bacteria Proteobacteria
#> 11 Sediment (estuary) 2 319044 92608.67 Bacteria Proteobacteria
#> 12 Ocean 1 339261 398276.67 Bacteria Bacteroidetes
#> 13 Feces 2 331820 607768.00 Bacteria Bacteroidetes
#> 14 Mock 2 331820 271074.67 Bacteria Bacteroidetes
#> 15 Soil 2 256977 111549.00 Bacteria Verrucomicrobia
#> 16 Feces 1 189047 683201.75 Bacteria Firmicutes
#> 17 Mock 1 189047 413917.00 Bacteria Firmicutes
#> 18 Skin 1 98605 117793.67 Bacteria Firmicutes
#> Class Order Family Genus Species
#> 1 Actinobacteria <NA> <NA> <NA> <NA>
#> 2 Chloroplast <NA> <NA> <NA> <NA>
#> 3 Nostocophycideae <NA> <NA> <NA> <NA>
#> 4 Betaproteobacteria <NA> <NA> <NA> <NA>
#> 5 Betaproteobacteria <NA> <NA> <NA> <NA>
#> 6 Gammaproteobacteria <NA> <NA> <NA> <NA>
#> 7 Gammaproteobacteria <NA> <NA> <NA> <NA>
#> 8 Gammaproteobacteria <NA> <NA> <NA> <NA>
#> 9 Alphaproteobacteria <NA> <NA> <NA> <NA>
#> 10 Alphaproteobacteria <NA> <NA> <NA> <NA>
#> 11 Deltaproteobacteria <NA> <NA> <NA> <NA>
#> 12 Flavobacteria <NA> <NA> <NA> <NA>
#> 13 Bacteroidia <NA> <NA> <NA> <NA>
#> 14 Bacteroidia <NA> <NA> <NA> <NA>
#> 15 Spartobacteria <NA> <NA> <NA> <NA>
#> 16 Clostridia <NA> <NA> <NA> <NA>
#> 17 Clostridia <NA> <NA> <NA> <NA>
#> 18 Bacilli <NA> <NA> <NA> <NA>
That most likely means there is some incompatibility between your phyloseq object and fantaxtic. Could you show me your sample_data
like below?
require("phyloseq")
#> Loading required package: phyloseq
require("tidyverse")
#> Loading required package: tidyverse
data(GlobalPatterns)
sample_data(GlobalPatterns) %>% data.frame(.) %>% tibble(.)
#> # A tibble: 26 × 7
#> X.SampleID Primer Final_Barcode Barcode_truncat… Barcode_full_le… SampleType
#> <fct> <fct> <fct> <fct> <fct> <fct>
#> 1 CL3 ILBC_01 AACGCA TGCGTT CTAGCGTGCGT Soil
#> 2 CC1 ILBC_02 AACTCG CGAGTT CATCGACGAGT Soil
#> 3 SV1 ILBC_03 AACTGT ACAGTT GTACGCACAGT Soil
#> 4 M31Fcsw ILBC_04 AAGAGA TCTCTT TCGACATCTCT Feces
#> 5 M11Fcsw ILBC_05 AAGCTG CAGCTT CGACTGCAGCT Feces
#> 6 M31Plmr ILBC_07 AATCGT ACGATT CGAGTCACGAT Skin
#> 7 M11Plmr ILBC_08 ACACAC GTGTGT GCCATAGTGTG Skin
#> 8 F21Plmr ILBC_09 ACACAT ATGTGT GTAGACATGTG Skin
#> 9 M31Tong ILBC_10 ACACGA TCGTGT TGTGGCTCGTG Tongue
#> 10 M11Tong ILBC_11 ACACGG CCGTGT TAGACACCGTG Tongue
#> # … with 16 more rows, and 1 more variable: Description <fct>
Created on 2024-07-08 by the reprex package (v2.0.1)
Here is how to recreate small subset of my phyloseq object (10 samples, 5 ASVs)
> dput(ps_subset)
new("phyloseq", otu_table = new("otu_table", .Data = structure(c(0.0294286699620646,
0.000143694677549144, 0.0129612599149328, 0.000258650419588458,
0.0108633176227153, 0.0345451362548871, 0.000233413082803291,
0.0143549045924024, 0.000116706541401646, 0.012691836377429,
0.0117188644420356, 0.000439457416576334, 0.0129493452084493,
0.000380863094366156, 0.00339847068819031, 0.000967702915205032,
0.0518124269182694, 0.000846740050804403, 0.0480625781218499,
0.000524172412402726, 0.0082013108922631, 0.000988109746055795,
0.0110009551727545, 0.000955172754520602, 0.00260202233128026,
0.0338230269900922, 0.000310587942976054, 0.0139764574339224,
0.000341646737273659, 0.0123924589247445, 0.0414228879007723,
0.00014041656915516, 0.0172712380060847, 0.00016381933068102,
0.0139012403463609, 0.0121970663265306, 0.000637755102040816,
0.0152264030612245, 0.00123565051020408, 0.00418526785714286,
0.000353186707336651, 0.0500882966768342, 0.000417402472306951,
0.0768341627869642, 0.000288970942366351, 0.00403329891584925,
0.00622741352607124, 0.000677594217862674, 0.000903458957150232,
0.00245224574083634), dim = c(5L, 10L), dimnames = list(c("ASV_1",
"ASV_3", "ASV_4", "ASV_5", "ASV_6"), c("08-bulk", "08-FA", "08-LA",
"08-OSW", "08-PW", "09-bulk", "09-FA", "09-LA", "09-OSW", "09-PW"
))), taxa_are_rows = TRUE), tax_table = new("taxonomyTable",
.Data = structure(c("Bacteria", "Bacteria", "Bacteria", "Bacteria",
"Bacteria", "Bacteroidota", "Proteobacteria", "Bacteroidota",
"Bacteroidota", "Proteobacteria", "Bacteroidia", "Alphaproteobacteria",
"Bacteroidia", "Bacteroidia", "Gammaproteobacteria", "Flavobacteriales",
"SAR11 clade", "Flavobacteriales", "Flavobacteriales", "Steroidobacterales",
"Flavobacteriaceae", "Clade I", "Flavobacteriaceae", "Flavobacteriaceae",
"Woeseiaceae", "Maribacter", "Clade Ia", "Aquibacter", "Polaribacter",
"Woeseia", NA, NA, NA, NA, NA), dim = c(5L, 7L), dimnames = list(
c("ASV_1", "ASV_3", "ASV_4", "ASV_5", "ASV_6"), c("Kingdom",
"Phylum", "Class", "Order", "Family", "Genus", "Species"
)))), sam_data = new("sample_data", .Data = list(c("08-bulk",
"08-FA", "08-LA", "08-OSW", "08-PW", "09-bulk", "09-FA", "09-LA",
"09-OSW", "09-PW"), c("Sediment_08", "Sediment_08", "Sediment_08",
"Seawater_08", "Sediment_08", "Sediment_09", "Sediment_09", "Sediment_09",
"Seawater_09", "Sediment_09"), c("Bulk", "FA", "LA", "OSW", "PW",
"Bulk", "FA", "LA", "OSW", "PW"), structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), levels = "2204", class = "factor"),
structure(c(5L, 4L, 3L, 1L, 2L, 5L, 4L, 3L, 1L, 2L), levels = c("2204_B_OSW",
"2204_C_PW", "2204_D_LA", "2204_E_FA", "2204_F_Bulk"), class = "factor"),
c("Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo",
"Zymo", "Zymo", "Zymo"), c("No", "No", "No", "No", "No",
"No", "No", "No", "No", "No")), names = c("Sample_name",
"Sample_group", "Fraction", "Sampling_date", "Date_Fraction",
"Extraction", "For_RNA"), row.names = c("08-bulk", "08-FA", "08-LA",
"08-OSW", "08-PW", "09-bulk", "09-FA", "09-LA", "09-OSW", "09-PW"
), .S3Class = "data.frame"), phy_tree = NULL, refseq = NULL)
My sample_data looks like:
> sample_data(rel_2123_core) %>% data.frame(.) %>% tibble(.)
# A tibble: 159 × 7
Sample_name Sample_group Fraction Sampling_date Date_Fraction Extraction For_RNA
<chr> <chr> <chr> <fct> <fct> <chr> <chr>
1 01-bulk4-Zy Sediment_1 Bulk 2112 2112_F_Bulk Zymo No
2 01-FA1-Zy Sediment_1 FA 2112 2112_E_FA Zymo No
3 01-LA1 Sediment_1 LA 2112 2112_D_LA Powerwater No
4 01-OSW1 Seawater_1 OSW 2112 2112_B_OSW Powerwater No
5 01-PW1 Sediment_1 PW 2112 2112_C_PW Powerwater No
6 01-SSW1 Seawater_1 SSW 2112 2112_A_SSW Powerwater No
7 02-bulk21-Zy Sediment_2 Bulk 2112 2112_F_Bulk Zymo No
8 02-FA11-Zy Sediment_2 FA 2112 2112_E_FA Zymo No
9 02-LA6 Sediment_2 LA 2112 2112_D_LA Powerwater No
10 02-OSW6 Seawater_2 OSW 2112 2112_B_OSW Powerwater No
# ℹ 149 more rows
# ℹ Use `print(n = ...)` to see more rows
Thanks again for looking into this!
I can't reproduce your problem. Would you mind updating restarting your R version, reinstalling fantaxtic to the latest version, and check if the issue persists?
Hi, that resolved the issue! Thanks a lot for your time!
Hi, first of all I want to say thanks for developing such a cool tool for analyzing microbiome data! It is super helpful.
I am currently having issues with the top_taxa function. I have a phyloseq object (rel_2123_core) for which I want to calculate the most abundant families in each grouping. I have 30 groups (Date_Fraction in my phyloseq sample data) in my ps_obj, and I was hoping to get the top 20 families for each group, by using:
I set by_proportion = FALSE since these are already relative abundances. and I get:
Which is not what I expected. I expected it to name 20 ASVs for each group. I know there are some overlaps between my groups but I know there are also unique families, so there should be more taxa if it does consider the grouping factor. I also checked this witn ntaxa=2, and also only got 2 families. The NAs under the Date_Fraction column of the result also tells me something is not right. I also tried the ps_obj with absolute abundances, set by_proportion = TRUE, and got the same result.
Am I using the top_taxa function wrong? If so, how do I correctly use the function so that it gets the top taxa for the specified group? Thanks a lot for your help!