Closed DarioS closed 1 week ago
Resource use is controlled by a combination of OpenMP and BiocParallel - the latter for parallelising over the nhoods, and OpenMP for multi-threading in the Rcpp code. A default for glmm.solver
will be added. In terms of choosing between solvers, Fisher is Quasi-Newton, so should have fast convergence when close to the (local) optimum. The HE is good for solving with a pre-defined covariance matrix, such as a genetic relationship matrix. HE-NNLS is implemented for the cases where variance parameters are negative, which is caused by "true" values being very close to zero. The solver will give a message and highlight that it is switching to the constrained optimisation.
GLMMs tend to scale poorly - while I have implemented a range of measures to cut this down, i.e. parallelisation and multi-threading, however, the bottlenecks are several dense matrix multiplications which are ~ $O(n^3)$.
I'm not sure why your Rstudio is ignoring environmental variables, as this is the recommendation: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#OpenMP-support - you may need to seek Rstudio support to resolve the issue.
Specifying sample ID as a fixed effect isn't a quick workaround to avoid enriched neighbourhoods comprised of one sample.
design <- data.frame(colData(MILOdata))[, c("Sample", "Smoker")]
design <- distinct(design)
rownames(design) <- design$Sample
testNhoods(MILOdata, design = ~ Smoker + Sample, design.df = design, fdr.weighting = "graph-overlap")
Using TMM normalisation
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
SampleOSCC_26-P
> library(BiocParallel)
> da_results <- testNhoods(hn_milo, design = ~ Condition+(1|patient_ID),
+ design.df = hn_design,
+ fdr.weighting="graph-overlap")
Random effects found
Using TMM normalisation
Running GLMM model - this may take a few minutes
Error in testNhoods(hn_milo, design = ~Condition + (1 | patient_ID), design.df = hn_design, :
Lowest traceback returned: 5: handle_error(e)
4: h(simpleError(msg, call))
3: .handleSimpleError(function (e)
{
annotated_condition <- handle_error(e)
stop(annotated_condition)
}, "argument is of length zero", base::quote(if (!glmm.control$solver %in%
c("HE", "Fisher", "HE-NNLS")) {
stop(glmm.control$solver, " not recognised - must be HE, HE-NNLS or Fisher")
}))
2: fitGLMM(X = Xmodel, Z = Zmodel, y = Y[i, ], offsets = off.sets,
random.levels = randlevels, REML = reml, dispersion = disper[i],
geno.only = genonly, Kin = kinship, glmm.control = glmm.contr)
1: FUN(...)
In addition: There were 16 warnings (use warnings() to see them)
Hi @MikeDMorgan, I performed a GLMM analysis. However, it threw an error that I couldn't identify. Is there any way to check where the error is coming from in this situation?
Parameter glmm.solver
is mandatory if a mixed model fitted, but it had no default. That one was already resolved by #328.
Here is another problem: after performing the GLMM, my DA results include NA values. However, plotNhoodGraphDA
and groupNhoods
cannot handle NA values.
da_results logFC logCPM SE tvalue PValue patient_ID variance Converged Dispersion Logliklihood Nhood 1 -22.8268334 8.004828 10552.160078 -0.002163238 0.9982740 3.540525e+01 FALSE 0.01097306 -82.278581 1 2 -22.4599347 10.410420 10536.597689 -0.002131612 0.9982992 9.792410e+00 FALSE 0.01154928 -84.030955 2 3 -21.0766982 9.975377 10408.624273 -0.002024926 0.9983843 9.374782e-01 FALSE 0.01201461 -129.165084 3 4 NA 10.662702 NA NA NA NA FALSE NA NA 4 5 -22.1270556 9.093788 10354.004528 -0.002137053 0.9982949 7.603262e-01 FALSE 0.01194637 -64.592597 5 6 -22.3102589 8.740509 10457.424887 -0.002133437 0.9982978 3.258474e+01 FALSE 0.01143842 -110.076552 6 7 -21.4542960 13.159483 10246.905729 -0.002093734 0.9983294 2.088114e+01 FALSE 0.01157411 -81.276525 7
plotNhoodGraphDA(hn_milo, da_results, layout="UMAP",alpha=0.1) Error in
[<-.data.frame
(*tmp*
, signif_res$SpatialFDR > alpha, res_column, : missing values are not allowed in subscripted assignments of data framesda_results <- groupNhoods(hn_milo, da_results,compute.new =T,da.fdr =0.1)# max.lfc.delta = 2, Found NA DA neighbourhoods at FDR 10% Error in ..subscript.2ary(x, l[[1L]], l[[2L]], drop = drop[1L]) : NA subscripts in x[i,j] not supported for 'x' inheriting from sparseMatrix In addition: Warning message: In groupNhoods(hn_milo, da_results, compute.new = T, da.fdr = 0.1) : NA values found in SpatialFDR vector
@yulijia You can use subset.nhoods
to plot the nhoods with non-NA values. These usually arise because there is model separation - you have a variable that perfectly separates zero from non-zero counts in one or more nhoods. You can check for these using the checkSeparation
function, with the goal being to adjust either your NN-graph or nhood refinement such that you don't get this separation (or you exclude these from your DA testing).
On a separate note - your issue is unrelated to the OP, so you should open a new issue not append yours to the end of this one.
Is it feasible to add a progress bar if a mixed model is used? It has been running for the past five hours without a progress message.
It also appears to consume all CPUs on a server, causing CPU starvation due to RStudio Server ignoring a user's
~/.bashrc
file with:I wonder how to choose between different options for
glmm.solver
in case one of them is less resource-intensive. I tried"Fisher"
.