Nanostring-Biostats / GeomxTools

Tools for NanoString GeoMx Digital Spatial Profiler data for reading DCC and PKC files to NanoStringGeomxSet class, Normalization and QC.
MIT License
34 stars 16 forks source link

mixedModelDE is only for mixed models #138

Open NicoleEO opened 2 years ago

NicoleEO commented 2 years ago
danielnewhouse commented 2 years ago

Issue #174 is related to this. User receives the "object 'Group' not found" where Group is fixed effect column message. Could we consider adding in a "no random effects present" error message?

lardenoije commented 1 year ago

If there is still interest in adding support for non-mixed model analyses, I have created a modified version of mixedModelDE to fit simple linear models for a single-slide analysis I was working on: https://github.com/lardenoije/GeomxTools/blob/master/R/modelDE.R Perhaps it can be of use.

ahug030 commented 1 year ago

Hi @lardenoije, this could be very useful for me since I'm following this vignette: https://bioconductor.org/packages/devel/workflows/vignettes/GeoMxWorkflows/inst/doc/GeomxTools_RNA-NGS_Analysis.html#7_Differential_Expression

However I only have a single GeoMx slide to analyze so the mixed-effect model part is throwing an error. Would you be able to show how your modelDE.R function would substitute into this section of the vignette?

Within Slide Analysis: Glomeruli vs Tubules

One informative exploration is to study differences between morphological structures. In this example, we can study differential expression between glomeruli and tubules. We will focus on the diseased kidney tissues. Because we are comparing structures that co-exist within the a given tissue we will use the LMM model with a random slope. Morphological structure (Region) is our test variable. We control for tissue subsampling with slide name using a random slope and intercept; the intercept adjusts for the multiple regions placed per unique tissue, since we have one tissue per slide. If multiple tissues are placed per slide, we would change the intercept variable to the unique tissue name (ex: tissue name, Block ID, etc).

In this analysis we save log2 fold change estimates and P-values across all levels in the factor of interest. We also apply a Benjamini-Hochberg multiple test correction.

convert test variables to factors pData(target_demoData)$testRegion <- factor(pData(target_demoData)$region, c("glomerulus", "tubule")) pData(target_demoData)[["slide"]] <- factor(pData(target_demoData)[["slide name"]]) assayDataElement(object = target_demoData, elt = "log_q") <- assayDataApply(target_demoData, 2, FUN = log, base = 2, elt = "q_norm")

run LMM: formula follows conventions defined by the lme4 package results <- c() for(status in c("DKD", "normal")) { ind <- pData(target_demoData)$class == status mixedOutmc <- mixedModelDE(target_demoData[, ind], elt = "log_q", modelFormula = ~ testRegion + (1 + testRegion | slide), groupVar = "testRegion", nCores = parallel::detectCores(), multiCore = FALSE)

# format results as data.frame
r_test <- do.call(rbind, mixedOutmc["lsmeans", ])
tests <- rownames(r_test)
r_test <- as.data.frame(r_test)
r_test$Contrast <- tests

# use lapply in case you have multiple levels of your test factor to
# correctly associate gene name with it's row in the results table
r_test$Gene <- 
    unlist(lapply(colnames(mixedOutmc),
                  rep, nrow(mixedOutmc["lsmeans", ][[1]])))
r_test$Subset <- status
r_test$FDR <- p.adjust(r_test$`Pr(>|t|)`, method = "fdr")
r_test <- r_test[, c("Gene", "Subset", "Contrast", "Estimate", 
                     "Pr(>|t|)", "FDR")]
results <- rbind(results, r_test)

} Note that the example uses nCores = parallel:detectCores() and multiCore = FALSE to implement the parallel package clustering of all available cores. If working in a Windows environment use multicore = FALSE. If in a UNIX-based environment setting multicore = TRUE will parallelize using the mcapply package. If you do not want to use all available cores, change the nCores variable to the desired number to use.