diff_analysis for picrust2 predicted pathway abundance table?

BinhongLiu commented 3 years ago

Hi, I just trying to apply _diffanalysis function to the phyloseq object built with picrust2-predicted pathway abundance table.

The otu-table is like this:

pathway	sampleD1-10F	sampleD1-10	sampleD1-11F	sampleD1-11	sampleD1-12F
pathway1	11600.31	10598.87	13448.91	13520.78	12997.12
pathway2	0	0	0	0	0
pathway3	23.7838	70.1857	161.0672	85.7279	62.5566
pathway4	10073.85	8500.755	14023.53	9969.657	11871.62
pathway5	14118.24	13545.21	17733.63	17366.78	15573.78
pathway6	4106.691	3976.353	5092.591	4838.09	4042.221

The tax_table is like this:

pathway	Rank1
pathway1	N10-formyl-tetrahydrofolate_biosynthesis
pathway2	4-hydroxyphenylacetate_degradation
pathway3	superpathway_of_chorismate_metabolism
pathway4	homolactic_fermentation
pathway5	glycolysisIII(from_glucose)
pathway6	superpathway_of_arginine_and_polyamine_biosynthesis

There was an error happened when running the _diffanalysis analysis: Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

The code is here:

metadata <- read.csv("D:/16S_sequencing_data/YumingWang/metadata.txt",sep = "\t")
metadata <- subset(metadata, Group %in% c("High","Low"))
rownames(metadata) <- metadata$sample.id
test_cyc_path <- read.table("D:/16S_sequencing_data/YumingWang/HL/picrust2_out/pathways_out/path_abun_unstrat1.tsv", header=TRUE, sep="\t", row.names=1)
pathway_des <- openxlsx::read.xlsx("D:/16S_sequencing_data/YumingWang/HL/picrust2_out/pathways_out/pathway_description.xlsx", colNames=T, rowNames=T)
colnames(test_cyc_path) <- gsub("[.]", "-", colnames(test_cyc_path))
ps_cyc_path <- phyloseq(sample_data(metadata),
                         tax_table(as.matrix(pathway_des)),
                         otu_table(as.matrix(test_cyc_path),taxa_are_rows=T))
ps_xx <- microbiome::transform(ps_xx, "compositional")
ps_xx1 <- subset_taxa(subset_samples(ps_xx,Study=="1st" & Location=="stool"&Day=="28")) 
head(data.frame(sample_data(ps_xx1)))

              sample.id Location Group Day Swine_ID Study X10_16S_Copies_mg

sampleD28-10F sampleD28-10F stool High 28 10 1st 1.520 sampleD28-11F sampleD28-11F stool High 28 11 1st 4.240 sampleD28-12F sampleD28-12F stool High 28 12 1st 9.600 sampleD28-13F sampleD28-13F stool Low 28 13 1st 0.977 sampleD28-14F sampleD28-14F stool Low 28 14 1st 5.140 sampleD28-15F sampleD28-15F stool Low 28 15 1st 8.860

diffres <- diff_analysis(ps_xx1, 
                          classgroup="Group",
                          mlfun="lda",
                          filtermod="fdr",
                          firstcomfun = "kruskal.test",
                          firstalpha=0.05,
                          strictmod=TRUE,
                          secondcomfun = "wilcox.test",
                          subclmin=3,
                          subclwilc=TRUE,
                          secondalpha=0.05,
                          lda=3)

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

Is there any method to fix this error? Many thanks!

xiangpin commented 3 years ago

The old version don't support the function abundance table, but the newest version (github) has supported it. Note, if you don't want to use the relative abundance, and you want to use the table you has processed previously. You should set standard_method argument to count , which mean that original input dataset (you has processed previously,eg: this is efficient to use the count data after rarefing) will be used. Of course, you also can use the method of decostand in vegan to standardize.

BinhongLiu commented 3 years ago

Yes, you were right! I used the relative abundance pathway data (_ps_xx <- microbiome::transform(psxx, "compositional")) to perform the _diffanalysis. Also, I've installed the newest version (github). However, the error was still there. It seems to be that something related with the row.names is wrong but I don't know how to fix it. Could you provide a demo function abundance table and tax_table that could perform the analysis successfully? Thank you for your help! Hongbin

xiangpin commented 3 years ago

You can try to add type="others" in diff_analysis. I could not get more detail information,or would you please send the phyloseq class to me by email.

BinhongLiu commented 3 years ago

Thanks for your help! I've save the ps object and sent it to your email xshuangbin@163.com !

xiangpin commented 3 years ago

Ok，I viewed table of the pathway annotation, it only have one rank, don't contain hierarchical relationship. In the condition, the input is phyloseq class, it was not be supported. There are two methods to solve the problem.

first, you can complete the pathway annotation, because the kegg annotation also has hierarchical relationship.
second, you can convert phyloseq class to data.frame. since diff_analysis also support data.frame as input.

library(MicrobiotaProcess)
tax <- data.frame(ps_xx1@tax_table, check.names=FALSE)
sampleda <- data.frame(ps_xx1@sam_data, check.names=FALSE)
otuda <- data.frame(ps_xx1@otu_table, check.names=FALSE)
featuretab <- merge(otuda, tax, by=0)
rownames(featuretab) <- as.vector(featuretab$Rank1)
featuretab$Row.names <- NULL
featuretab$Rank1 <- NULL

featuretab <- data.frame(t(featuretab)*100, check.names=FALSE)

diffres <- diff_analysis(obj=featuretab,
                         sampleda=sampleda,
                         classgroup="Group",
                         #alltax=FALSE,
                         mlfun="lda",
                         filtermod="pvalue",
                         standard_method=NULL,
                         firstcomfun = "kruskal.test",
                         firstalpha=0.05,
                         strictmod=TRUE,
                         secondcomfun = "wilcox.test",
                         subclmin=3,
                         type="others",
                         subclwilc=TRUE,
                         secondalpha=0.01,
                         lda=3)
diffres

the taxda was not be provided, so the result is not be visualized by ggdiffcalde. But you can use ggdiffbox to visualize it.

p <- ggdiffbox(
               diffres, 
               l_xlabtext="relative abundance (%)",
               box_notch=FALSE, 
               colorlist=c("#00AED7", "#009E73")
        )
p

test1

you also can use ggdifftaxbar to visualize each discriminative feature by barplot.

ggdifftaxbar(
       obj=diffres, 
       xtextsize=1.5, 
       output="function_biomarkder_barplot",
       coloslist=c("#00AED7", "#009E73")
)

BinhongLiu commented 3 years ago

Thanks so much for your help! This method solves my issue perfectly! Cheers!

YuLab-SMU / MicrobiotaProcess

diff_analysis for picrust2 predicted pathway abundance table? #17