YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
167 stars 35 forks source link

diff_analysis for picrust2 predicted pathway abundance table? #17

Open BinhongLiu opened 3 years ago

BinhongLiu commented 3 years ago

Hi, I just trying to apply _diffanalysis function to the phyloseq object built with picrust2-predicted pathway abundance table.

The otu-table is like this:

pathway sampleD1-10F sampleD1-10 sampleD1-11F sampleD1-11 sampleD1-12F
pathway1 11600.31 10598.87 13448.91 13520.78 12997.12
pathway2 0 0 0 0 0
pathway3 23.7838 70.1857 161.0672 85.7279 62.5566
pathway4 10073.85 8500.755 14023.53 9969.657 11871.62
pathway5 14118.24 13545.21 17733.63 17366.78 15573.78
pathway6 4106.691 3976.353 5092.591 4838.09 4042.221

The tax_table is like this:

pathway Rank1
pathway1 N10-formyl-tetrahydrofolate_biosynthesis
pathway2 4-hydroxyphenylacetate_degradation
pathway3 superpathway_of_chorismate_metabolism
pathway4 homolactic_fermentation
pathway5 glycolysisIII(from_glucose)
pathway6 superpathway_of_arginine_and_polyamine_biosynthesis

There was an error happened when running the _diffanalysis analysis: Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

The code is here:

metadata <- read.csv("D:/16S_sequencing_data/YumingWang/metadata.txt",sep = "\t")
metadata <- subset(metadata, Group %in% c("High","Low"))
rownames(metadata) <- metadata$sample.id
test_cyc_path <- read.table("D:/16S_sequencing_data/YumingWang/HL/picrust2_out/pathways_out/path_abun_unstrat1.tsv", header=TRUE, sep="\t", row.names=1)
pathway_des <- openxlsx::read.xlsx("D:/16S_sequencing_data/YumingWang/HL/picrust2_out/pathways_out/pathway_description.xlsx", colNames=T, rowNames=T)
colnames(test_cyc_path) <- gsub("[.]", "-", colnames(test_cyc_path))
ps_cyc_path <- phyloseq(sample_data(metadata),
                         tax_table(as.matrix(pathway_des)),
                         otu_table(as.matrix(test_cyc_path),taxa_are_rows=T))
ps_xx <- microbiome::transform(ps_xx, "compositional")
ps_xx1 <- subset_taxa(subset_samples(ps_xx,Study=="1st" & Location=="stool"&Day=="28")) 
head(data.frame(sample_data(ps_xx1)))
              sample.id Location Group Day Swine_ID Study X10_16S_Copies_mg

sampleD28-10F sampleD28-10F stool High 28 10 1st 1.520 sampleD28-11F sampleD28-11F stool High 28 11 1st 4.240 sampleD28-12F sampleD28-12F stool High 28 12 1st 9.600 sampleD28-13F sampleD28-13F stool Low 28 13 1st 0.977 sampleD28-14F sampleD28-14F stool Low 28 14 1st 5.140 sampleD28-15F sampleD28-15F stool Low 28 15 1st 8.860

diffres <- diff_analysis(ps_xx1, 
                          classgroup="Group",
                          mlfun="lda",
                          filtermod="fdr",
                          firstcomfun = "kruskal.test",
                          firstalpha=0.05,
                          strictmod=TRUE,
                          secondcomfun = "wilcox.test",
                          subclmin=3,
                          subclwilc=TRUE,
                          secondalpha=0.05,
                          lda=3)

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Is there any method to fix this error? Many thanks!

xiangpin commented 3 years ago

The old version don't support the function abundance table, but the newest version (github) has supported it. Note, if you don't want to use the relative abundance, and you want to use the table you has processed previously. You should set standard_method argument to count , which mean that original input dataset (you has processed previously,eg: this is efficient to use the count data after rarefing) will be used. Of course, you also can use the method of decostand in vegan to standardize.

BinhongLiu commented 3 years ago

Yes, you were right! I used the relative abundance pathway data (_ps_xx <- microbiome::transform(psxx, "compositional")) to perform the _diffanalysis. Also, I've installed the newest version (github). However, the error was still there. It seems to be that something related with the row.names is wrong but I don't know how to fix it. Could you provide a demo function abundance table and tax_table that could perform the analysis successfully? Thank you for your help! Hongbin

xiangpin commented 3 years ago

You can try to add type="others" in diff_analysis. I could not get more detail information,or would you please send the phyloseq class to me by email.

BinhongLiu commented 3 years ago

Thanks for your help! I've save the ps object and sent it to your email xshuangbin@163.com !

xiangpin commented 3 years ago

Ok,I viewed table of the pathway annotation, it only have one rank, don't contain hierarchical relationship. In the condition, the input is phyloseq class, it was not be supported. There are two methods to solve the problem.

library(MicrobiotaProcess)
tax <- data.frame(ps_xx1@tax_table, check.names=FALSE)
sampleda <- data.frame(ps_xx1@sam_data, check.names=FALSE)
otuda <- data.frame(ps_xx1@otu_table, check.names=FALSE)
featuretab <- merge(otuda, tax, by=0)
rownames(featuretab) <- as.vector(featuretab$Rank1)
featuretab$Row.names <- NULL
featuretab$Rank1 <- NULL

featuretab <- data.frame(t(featuretab)*100, check.names=FALSE)

diffres <- diff_analysis(obj=featuretab,
                         sampleda=sampleda,
                         classgroup="Group",
                         #alltax=FALSE,
                         mlfun="lda",
                         filtermod="pvalue",
                         standard_method=NULL,
                         firstcomfun = "kruskal.test",
                         firstalpha=0.05,
                         strictmod=TRUE,
                         secondcomfun = "wilcox.test",
                         subclmin=3,
                         type="others",
                         subclwilc=TRUE,
                         secondalpha=0.01,
                         lda=3)
diffres          

the taxda was not be provided, so the result is not be visualized by ggdiffcalde. But you can use ggdiffbox to visualize it.

p <- ggdiffbox(
               diffres, 
               l_xlabtext="relative abundance (%)",
               box_notch=FALSE, 
               colorlist=c("#00AED7", "#009E73")
        )
p

test1

you also can use ggdifftaxbar to visualize each discriminative feature by barplot.

ggdifftaxbar(
       obj=diffres, 
       xtextsize=1.5, 
       output="function_biomarkder_barplot",
       coloslist=c("#00AED7", "#009E73")
)
BinhongLiu commented 3 years ago

Thanks so much for your help! This method solves my issue perfectly! Cheers!