getPopulationFit() return

flde commented 1 year ago

Dear Lamian Team,

Many thanks for the great tool! I run the XDE workflow with my own data follwoing following the tutorial.

The function getPopulationFit() returns a list of data frames with ncol equal to num.timepoint set in getPopulationFit() which is by default 1000. Later you rename the column names of colnames(Res$populationFit[[1]]) with the colnames(Res$expr) from the expression matrix. That works because the tutorial the expression matrix has the same number of columns as num.timepoint (1000).

That does not work if the expression matrix ncol does not fit the num.timepoint parameter, right? Can you explain the ressoning behind that?

I can work around this issue by setting num.timepoint=ncol(Res$expr) which solves the problem for my own data. But still get an error from the final plotXDEHm() call Error in rep(1:length(rle), rle): invalid 'times' argument.

Highly appreciate your help to understand this last step!

Best wishes, Florian

pecoraro90 commented 1 year ago

Hi Florian, I am having the same issue. Did you succeed in fixing it?

Winnie09 commented 1 year ago

Hi @flde and @pecoraro90 Thank you for your interest in the package Lamian! I looked into the functions getPopulationFit() and plotXDEHm(), and @flde is right that in the previous version, the number of cells in the gene expression matrix happened to be the same as the subsampling number of time points. I have updated both functions to solve the issue. Also, I removed the unnecessary line colnames(Res$populationFit[[1]]) <- colnames(Res$populationFit[[2]]) <- colnames(Res$expr) in the online manual. Please let me know if it works now. Thanks!

In the following, I show two examples where there are not 1000 cells in the data.

(1) When there are 1100 > 1000 cells (different from the default `num.timepoint = 1000`)

library(Lamian) data(expdata) expdata$cellanno <- rbind(expdata$cellanno, expdata$cellanno[1:100, ]) expdata$cellanno[1001:1100, 1] <- paste0(expdata$cellanno[1001:1100, 1], ';dup') expdata$pseudotime <- c(expdata$pseudotime, seq(1001, 1100)) names(expdata$pseudotime) <- expdata$cellanno[,1] expdata$expr <- cbind(expdata$expr, expdata$expr[,1:100]) colnames(expdata$expr)[1001:1100] <- paste0(colnames(expdata$expr)[1001:1100], ';dup') Res <- lamian_test( expr = expdata$expr, cellanno = expdata$cellanno, pseudotime = expdata$pseudotime, design = expdata$design, test.type = 'variable', testvar = 2, permuiter = 5, ncores = 1 )

stat <- Res$statistics stat <- stat[order(stat[, 1],-stat[, 3]),]

diffgene <- rownames(stat[stat[, grep('^fdr.*overall$', colnames(stat))] < 0.05, ])

Res$populationFit <- getPopulationFit(Res, gene = diffgene, type = 'variable') Res$covariateGroupDiff <- getCovariateGroupDiff(testobj = Res, gene = diffgene) Res$cluster <- clusterGene(Res, gene = diffgene, type = 'variable', k = 5) plotXDEHm( Res, cellWidthTotal = 180, cellHeightTotal = 350, subsampleCell = FALSE, sep = ':.*' )

(2) When there are 900 < 1000 cells (different from the default `num.timepoint = 1000`)

data(expdata) expdata$cellanno <- expdata$cellanno[1:9e2, ] expdata$pseudotime <- expdata$pseudotime[1:9e2] expdata$expr <- expdata$expr[,1:9e2]

Res <- lamian_test( expr = expdata$expr, cellanno = expdata$cellanno, pseudotime = expdata$pseudotime, design = expdata$design, test.type = 'variable', testvar = 2, permuiter = 5, ncores = 1 )

stat <- Res$statistics stat <- stat[order(stat[, 1],-stat[, 3]),] diffgene <- rownames(stat[stat[, grep('^fdr.*overall$', colnames(stat))] < 0.05, ])

Res$populationFit <- getPopulationFit(Res, gene = diffgene, type = 'variable') Res$covariateGroupDiff <- getCovariateGroupDiff(testobj = Res, gene = diffgene) Res$cluster <- clusterGene(Res, gene = diffgene, type = 'variable', k = 5) plotXDEHm( Res, cellWidthTotal = 180, cellHeightTotal = 350, subsampleCell = FALSE, sep = ':.*' )

flde commented 1 year ago

Hello @Winnie09,

Many thanks for the example code and corrections. Will check it out soon!

Best wishes, Florian

Winnie09 / Lamian