HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
161 stars 33 forks source link

mmDS error: gradient function must return a numeric vector of length 4 #24

Open owenwilkins opened 4 years ago

owenwilkins commented 4 years ago

Hello, thanks for providing this much needed package for DEG analysis of single cells from multiple subjects.

I am trying to use the mixed modeling approach using mmDS() however am receiving an error for each model that states: : gradient function must return a numeric vector of length 4>

I believe this error is being generated internally by glmmTMB but cannot determine why it is being generated or how to resolve. I'm including the full error message, the structure of my SingleCellExperiment object, function call to mmDS(), and SessionInfo() below.

Any assistance/insight you might be able to provide on this issue would be greatly appreciated!

Thanks again!

\ Error message:

<simpleError in (function (start, objective, gradient = NULL, hessian = NULL, ..., scale = 1, control = list(), lower = -Inf, upper = Inf) { par <- setNames(as.double(start), names(start)) n <- length(par) iv <- integer(78 + 3 * n) v <- double(130 + (n * (n + 27))/2) .Call(C_port_ivset, 2, iv, v) if (length(control)) { nms <- names(control) if (!is.list(control) || is.null(nms)) stop("'control' argument must be a named list") pos <- pmatch(nms, names(port_cpos)) if (any(nap <- is.na(pos))) { warning(sprintf(ngettext(length(nap), "unrecognized control element named %s ignored", "unrecognized control elements named %s ignored"), paste(sQuote(nms[nap]), collapse = ", ")), domain = NA) pos <- pos[!nap] control <- control[!nap] } ivpars <- pos <= 4 vpars <- !ivpars if (any(ivpars)) iv[port_cpos[pos[ivpars]]] <- as.integer(unlist(control[ivpars])) if (any(vpars)) v[port_cpos[pos[vpars]]] <- as.double(unlist(control[vpars])) } obj <- quote(objective(.par, ...)) rho <- new.env(parent = environment()) assign(".par", par, envir = rho) grad <- hess <- low <- upp <- NULL if (!is.null(gradient)) { grad <- quote(gradient(.par, ...)) if (!is.null(hessian)) { if (is.logical(hessian)) stop("logical 'hessian' argument not allowed. See documentation.") hess <- quote(hessian(.par, ...)) } } if (any(lower != -Inf) || any(upper != Inf)) { low <- rep_len(as.double(lower), length(par)) upp <- rep_len(as.double(upper), length(par)) } else low <- upp <- numeric() .Call(C_port_nlminb, obj, grad, hess, rho, low, upp, d = rep_len(as.double(scale), length(par)), iv, v) iv1 <- iv[1L] list(par = get(".par", envir = rho), objective = v[10L], convergence = (if (iv1 %in% 3L:6L) 0L else 1L), iterations = iv[31L], evaluations = c(function= iv[6L], gradient = iv[30L]), message = if (19 <= iv1 && iv1 <= 43) { if (any(B <- iv1 == port_cpos)) sprintf("'control' component '%s' = %g, is out of range", names(port_cpos)[B], v[iv1]) else sprintf("V[IV[1]] = V[%d] = %g is out of range (see PORT docu.)", iv1, v[iv1]) } else port_msg(iv1))})(c(beta = 0, beta = 0, betad = 0, theta = 0), function (x = last.par[-random], ...) { if (tracepar) { cat("par:\n") print(x) } if (!validpar(x)) return(NaN) ans <- try({ if (MCcontrol$doMC) { ff(x, order = 0) MC(last.par, n = MCcontrol$n, seed = MCcontrol$seed, order = 0) } else ff(x, order = 0) }, silent = silent) if (is.character(ans)) NaN else ans}, function (x = last.par[-random], ...) { ans <- try({ if (MCcontrol$doMC) { ff(x, order = 0) MC(last.par, n = MCcontrol$n, seed = MCcontrol$seed, order = 1) } else ff(x, order = 1) }, silent = silent) if (is.character(ans)) ans <- rep(NaN, length(x)) if (tracemgc) cat("outer mgc: ", max(abs(ans)), "\n") ans}, control = list(iter.max = 300, eval.max = 400)): gradient function must return a numeric vector of length 4>

SingleCellExperiment object:

> sce2 class: SingleCellExperiment dim: 100 36910 metadata(1): experiment_info assays(2): counts logcounts rownames(100): FO538757.2 AP006222.2 ... DNAJC16 AGMAT rowData names(0): colnames(36910): /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/3401HC_AAACCTGAGTGAACGC /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/3401HC_AAACCTGCAGCTCGCA ... /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/PD134_TTTGTCACATGACGGA /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/PD134_TTTGTCAGTGCGGTAA colData names(4): cluster_id sample_id group_id sample_id2 reducedDimNames(2): PCA UMAP spikeNames(0): altExpNames(0):

Function call to mmDS I am using is:

mm <- mmDS(sce2, method = "nbinom", n_threads=10)

sessionInfo() ` R version 3.6.0 (2019-04-26) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] statmod_1.4.34 muscat_1.0.0
[3] scater_1.14.6 ggplot2_3.3.0
[5] SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 [7] DelayedArray_0.12.2 BiocParallel_1.20.1
[9] matrixStats_0.55.0 Biobase_2.46.0
[11] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[13] IRanges_2.20.2 S4Vectors_0.24.3
[15] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] backports_1.1.5 circlize_0.4.8 Hmisc_4.3-1
[4] blme_1.0-4 plyr_1.8.6 TMB_1.7.16
[7] splines_3.6.0 listenv_0.8.0 digest_0.6.25
[10] foreach_1.4.8 htmltools_0.4.0 viridis_0.5.1
[13] gdata_2.18.0 lmerTest_3.1-1 magrittr_1.5
[16] checkmate_2.0.0 memoise_1.1.0 cluster_2.0.8
[19] doParallel_1.0.15 limma_3.42.2 ComplexHeatmap_2.2.0
[22] globals_0.12.5 annotate_1.64.0 RcppParallel_4.4.4
[25] prettyunits_1.1.1 jpeg_0.1-8.1 colorspace_1.4-1
[28] blob_1.2.1 xfun_0.12 dplyr_0.8.5
[31] crayon_1.3.4 RCurl_1.98-1.1 genefilter_1.68.0
[34] lme4_1.1-21 survival_3.1-11 iterators_1.0.12
[37] glue_1.3.1 gtable_0.3.0 zlibbioc_1.32.0
[40] XVector_0.26.0 GetoptLong_0.1.8 BiocSingular_1.2.2
[43] future.apply_1.4.0 shape_1.4.4 scales_1.1.0
[46] DBI_1.1.0 edgeR_3.28.1 Rcpp_1.0.3
[49] viridisLite_0.3.0 xtable_1.8-4 progress_1.2.2
[52] htmlTable_1.13.3 clue_0.3-57 foreign_0.8-71
[55] bit_1.1-15.2 rsvd_1.0.3 Formula_1.2-3
[58] htmlwidgets_1.5.1 gplots_3.0.3 RColorBrewer_1.1-2
[61] acepack_1.4.1 farver_2.0.3 pkgconfig_2.0.3
[64] XML_3.99-0.3 uwot_0.1.5 nnet_7.3-12
[67] locfit_1.5-9.1 labeling_0.3 tidyselect_1.0.0
[70] rlang_0.4.5 reshape2_1.4.3 AnnotationDbi_1.48.0
[73] munsell_0.5.0 tools_3.6.0 RSQLite_2.2.0
[76] stringr_1.4.0 knitr_1.28 bit64_0.9-7
[79] caTools_1.18.0 purrr_0.3.3 future_1.16.0
[82] nlme_3.1-139 pbkrtest_0.4-8.6 compiler_3.6.0
[85] rstudioapi_0.11 beeswarm_0.2.3 png_0.1-7
[88] variancePartition_1.16.1 tibble_2.1.3 geneplotter_1.64.0
[91] stringi_1.4.6 lattice_0.20-38 Matrix_1.2-17
[94] nloptr_1.2.2 vctrs_0.2.4 pillar_1.4.3
[97] lifecycle_0.2.0 GlobalOptions_0.1.1 RcppAnnoy_0.0.16
[100] BiocNeighbors_1.4.2 cowplot_1.0.0 data.table_1.12.8
[103] bitops_1.0-6 irlba_2.3.3 colorRamps_2.3
[106] R6_2.4.1 latticeExtra_0.6-29 KernSmooth_2.23-15
[109] gridExtra_2.3 vipor_0.4.5 codetools_0.2-16
[112] boot_1.3-22 MASS_7.3-51.4 gtools_3.8.1
[115] assertthat_0.2.1 DESeq2_1.26.0 rjson_0.2.20
[118] withr_2.1.2 sctransform_0.2.1 GenomeInfoDbData_1.2.2
[121] hms_0.5.3 grid_3.6.0 rpart_4.1-15
[124] glmmTMB_1.0.0 minqa_1.2.4 DelayedMatrixStats_1.8.0 [127] numDeriv_2016.8-1.1 base64enc_0.1-3 ggbeeswarm_0.6.0 `

plger commented 4 years ago

Dear Owen,

A few questions:

1) could you tell us whether you installed the bioconductor release of muscat, the devel, or a github branch?

2) what is the class(counts(sce2)) ?

3) would you be able to make a much reduced version of the object (just a subset of cells/genes) with which the error is reproducible, and share it with us (you can scramble gene names and colData if you like)? That's the easiest to figure out what's going on...

4) in case not, can you reproduce the error using a single thread, and if so can you post a traceback() ? (at the moment the error is rather hard to read)

(On a side note, the nbinom glmm version of mmDS is extremely slow and doesn't appear better than the other modes in our benchmark)

Pierre-Luc

owenwilkins commented 4 years ago

Hi @plger , thanks very much for your super quick response.

1) The Bioconductor version. I did try to install the Github version initially, however there was an issue with the HPC cluster that I tried to complete this install on, so I used the Bioconductor version.

2) Below is the output from class(counts(sce2)):

[1] "dgCMatrix" attr(,"package") [1] "Matrix"

3) I can definitely send a smaller version to help identofy the issue. What is the best way for me to get this file to you?

Thanks for the note on mmDS with nBinom, will give this a go with the other approaches when its all working.

Thanks again, Owen

plger commented 4 years ago

Thanks for the precisions; I sent you an invite by email (@ Dartmouth), let me know in case you didn't get it.

owenwilkins commented 4 years ago

Perfect, thanks so much. Have just responded and send the file via email. let me know if you have any issues receiving it that way.

plger commented 4 years ago

Hi @owenwilkins

So I'm not actually getting exactly the same error you are getting, but here are the things I had to address to make it work:

Let me know if that fixes your problem. Pierre-Luc

owenwilkins commented 4 years ago

@plger , thanks very much for your quick response again!

Seems like I did more harm than good creating a reduced object that was easily sharable. The barcodes issue and the 0 reads are definitely caused by how I cut up the original dataset to make it smaller, so sorry about that.

I've installed the development version of muscat using the code you provided, and tried to run mm <- mmDS(sce3, method = "vst", vst = "sctransform", n_cells = 10) although I am getting an error: No cluster has at least 2 samples with at least 10 cells. This might be related to how I made the smaller object to share.

Would it be possible for you to share the code you used to make it work? Might be the easiest way of diagnosing if its something I did to the dataset when subsetting it.

Thanks again for all your help on this! Owen

owenwilkins commented 4 years ago

Also, not sure if there may be an issue with prepSCE in the development version. Each time I try to run:

sce <- prepSCE(sce, cluster_id = "celltype", sample_id = "subject", group_id = "sample") I get an error: unused arguments (cluster_id = "celltype", sample_id = "subject", group_id = "sample"), despite confirming than celltype, subject, and sample are all valid columns in sce@colData

HelenaLC commented 4 years ago

Re this last issue- these arguments have previously been changed to cluster_id -> kid, sample_id -> sid, group_id -> gid; not sure which version exactly you are running, but I'd recommend trying prepSCE(sce, kid = "celltype", sid = "subject", gid = "sample") and/or checking the function documentation.

owenwilkins commented 4 years ago

Ah perfect thanks, that solved that part of the issue