Open FrancescoVit opened 4 years ago
Sorry for the late answer!
That's rather tricky, as there report.data.frame tends to summarize variables (compute mean, SD etc.), and where what we'd need is directly reporting the values.
One option could be to create a new function report_values
or something like that that would simply report the values according to some groups, but we'd need to think about it and of the generalizability of the usage of such function.
A start would be to reformat your table to have in a "long" format:
library(tidyverse)
df <- data.frame(
sample1=c(0.0084246282,0.41627099,0.55475503,0,0.000724518,5.391762e-05,0.01977092),
sample2=c(0.0168571327,0.132988, 0.80289437, 3.560112e-05, 0.004272135, 0.04238314, 0.000569618),
sample3=c(0.0020299288,0.53813817,0.42367947, 0.03311006, 0.0007978327, 3.534702e-05, 0.002209189),
row.names = c("Actinobacteria", "Bacteroidetes", "Firmicutes", "Fusobacteria", "Proteobacteria", "Verrucomicrobia", "Other"))
df %>%
tibble::rownames_to_column("Species") %>%
tidyr::pivot_longer(2:4, names_to="Sample", values_to="N") %>%
dplyr::arrange(Sample, desc(N))
#> # A tibble: 21 x 3
#> Species Sample N
#> <chr> <chr> <dbl>
#> 1 Firmicutes sample1 0.555
#> 2 Bacteroidetes sample1 0.416
#> 3 Other sample1 0.0198
#> 4 Actinobacteria sample1 0.00842
#> 5 Proteobacteria sample1 0.000725
#> 6 Verrucomicrobia sample1 0.0000539
#> 7 Fusobacteria sample1 0
#> 8 Firmicutes sample2 0.803
#> 9 Bacteroidetes sample2 0.133
#> 10 Verrucomicrobia sample2 0.0424
#> # ... with 11 more rows
Created on 2020-03-14 by the reprex package (v0.3.0)
Great package, great time saver, love it!
Describe the solution you'd like
I'm a microbial ecologist, as such I always need to describe composition of the community in samples and/or samples group. Basically, the graphical output is usually stacked bar plot, to 100% abundance, with a textual description of the most abundant taxon/species in each sample. Underlying data are in the form of a data.frame with species (or taxon) as rows and samples as columns. Each number is the relative abundance of a taxon in a sample
Example data, adapted from https://stackoverflow.com/questions/38452577/making-stack-bar-plot-of-bacterial-abundance
I would like to automatically describe the top N species (i.e. the most abundant, in row) in each sample. For example the data could then be reported as:
"Firmicutes (55.47%) and Bacteroidetes (41.63%), made up almost the entire bacterial community in sample1 (97.1), while Fusobacteria was absent" or similar.
How could we do it? Don't really know how, should be related to the report.data.frame() function. I hope it is not too narrow in scope as a request/feature proposal, generalizing it is a report.data.frame() method extension which highlight top N features instead of mean or other statistics. Using report() on the above produce