mmDS error: gradient function must return a numeric vector of length 4

owenwilkins commented 4 years ago

Hello, thanks for providing this much needed package for DEG analysis of single cells from multiple subjects.

I am trying to use the mixed modeling approach using mmDS() however am receiving an error for each model that states: : gradient function must return a numeric vector of length 4>

I believe this error is being generated internally by glmmTMB but cannot determine why it is being generated or how to resolve. I'm including the full error message, the structure of my SingleCellExperiment object, function call to mmDS(), and SessionInfo() below.

Any assistance/insight you might be able to provide on this issue would be greatly appreciated!

Thanks again!

\ Error message:

SingleCellExperiment object:

> sce2 class: SingleCellExperiment dim: 100 36910 metadata(1): experiment_info assays(2): counts logcounts rownames(100): FO538757.2 AP006222.2 ... DNAJC16 AGMAT rowData names(0): colnames(36910): /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/3401HC_AAACCTGAGTGAACGC /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/3401HC_AAACCTGCAGCTCGCA ... /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/PD134_TTTGTCACATGACGGA /Users/omw/mountpt/HavrdaM/scrna-seq/01_pre_processing/raw_data/PD134_TTTGTCAGTGCGGTAA colData names(4): cluster_id sample_id group_id sample_id2 reducedDimNames(2): PCA UMAP spikeNames(0): altExpNames(0):

Function call to mmDS I am using is:

mm <- mmDS(sce2, method = "nbinom", n_threads=10)

sessionInfo() ` R version 3.6.0 (2019-04-26) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] statmod_1.4.34 muscat_1.0.0
[3] scater_1.14.6 ggplot2_3.3.0
[5] SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 [7] DelayedArray_0.12.2 BiocParallel_1.20.1
[9] matrixStats_0.55.0 Biobase_2.46.0
[11] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[13] IRanges_2.20.2 S4Vectors_0.24.3
[15] BiocGenerics_0.32.0

loaded via a namespace (and not attached): [1] backports_1.1.5 circlize_0.4.8 Hmisc_4.3-1
[4] blme_1.0-4 plyr_1.8.6 TMB_1.7.16
[7] splines_3.6.0 listenv_0.8.0 digest_0.6.25
[10] foreach_1.4.8 htmltools_0.4.0 viridis_0.5.1
[13] gdata_2.18.0 lmerTest_3.1-1 magrittr_1.5
[16] checkmate_2.0.0 memoise_1.1.0 cluster_2.0.8
[19] doParallel_1.0.15 limma_3.42.2 ComplexHeatmap_2.2.0
[22] globals_0.12.5 annotate_1.64.0 RcppParallel_4.4.4
[25] prettyunits_1.1.1 jpeg_0.1-8.1 colorspace_1.4-1
[28] blob_1.2.1 xfun_0.12 dplyr_0.8.5
[31] crayon_1.3.4 RCurl_1.98-1.1 genefilter_1.68.0
[34] lme4_1.1-21 survival_3.1-11 iterators_1.0.12
[37] glue_1.3.1 gtable_0.3.0 zlibbioc_1.32.0
[40] XVector_0.26.0 GetoptLong_0.1.8 BiocSingular_1.2.2
[43] future.apply_1.4.0 shape_1.4.4 scales_1.1.0
[46] DBI_1.1.0 edgeR_3.28.1 Rcpp_1.0.3
[49] viridisLite_0.3.0 xtable_1.8-4 progress_1.2.2
[52] htmlTable_1.13.3 clue_0.3-57 foreign_0.8-71
[55] bit_1.1-15.2 rsvd_1.0.3 Formula_1.2-3
[58] htmlwidgets_1.5.1 gplots_3.0.3 RColorBrewer_1.1-2
[61] acepack_1.4.1 farver_2.0.3 pkgconfig_2.0.3
[64] XML_3.99-0.3 uwot_0.1.5 nnet_7.3-12
[67] locfit_1.5-9.1 labeling_0.3 tidyselect_1.0.0
[70] rlang_0.4.5 reshape2_1.4.3 AnnotationDbi_1.48.0
[73] munsell_0.5.0 tools_3.6.0 RSQLite_2.2.0
[76] stringr_1.4.0 knitr_1.28 bit64_0.9-7
[79] caTools_1.18.0 purrr_0.3.3 future_1.16.0
[82] nlme_3.1-139 pbkrtest_0.4-8.6 compiler_3.6.0
[85] rstudioapi_0.11 beeswarm_0.2.3 png_0.1-7
[88] variancePartition_1.16.1 tibble_2.1.3 geneplotter_1.64.0
[91] stringi_1.4.6 lattice_0.20-38 Matrix_1.2-17
[94] nloptr_1.2.2 vctrs_0.2.4 pillar_1.4.3
[97] lifecycle_0.2.0 GlobalOptions_0.1.1 RcppAnnoy_0.0.16
[100] BiocNeighbors_1.4.2 cowplot_1.0.0 data.table_1.12.8
[103] bitops_1.0-6 irlba_2.3.3 colorRamps_2.3
[106] R6_2.4.1 latticeExtra_0.6-29 KernSmooth_2.23-15
[109] gridExtra_2.3 vipor_0.4.5 codetools_0.2-16
[112] boot_1.3-22 MASS_7.3-51.4 gtools_3.8.1
[115] assertthat_0.2.1 DESeq2_1.26.0 rjson_0.2.20
[118] withr_2.1.2 sctransform_0.2.1 GenomeInfoDbData_1.2.2
[121] hms_0.5.3 grid_3.6.0 rpart_4.1-15
[124] glmmTMB_1.0.0 minqa_1.2.4 DelayedMatrixStats_1.8.0 [127] numDeriv_2016.8-1.1 base64enc_0.1-3 ggbeeswarm_0.6.0 `

plger commented 4 years ago

Dear Owen,

A few questions:

1) could you tell us whether you installed the bioconductor release of muscat, the devel, or a github branch?

2) what is the class(counts(sce2)) ?

3) would you be able to make a much reduced version of the object (just a subset of cells/genes) with which the error is reproducible, and share it with us (you can scramble gene names and colData if you like)? That's the easiest to figure out what's going on...

4) in case not, can you reproduce the error using a single thread, and if so can you post a traceback() ? (at the moment the error is rather hard to read)

(On a side note, the nbinom glmm version of mmDS is extremely slow and doesn't appear better than the other modes in our benchmark)

Pierre-Luc

owenwilkins commented 4 years ago

Hi @plger , thanks very much for your super quick response.

1) The Bioconductor version. I did try to install the Github version initially, however there was an issue with the HPC cluster that I tried to complete this install on, so I used the Bioconductor version.

2) Below is the output from class(counts(sce2)):

[1] "dgCMatrix" attr(,"package") [1] "Matrix"

3) I can definitely send a smaller version to help identofy the issue. What is the best way for me to get this file to you?

Thanks for the note on mmDS with nBinom, will give this a go with the other approaches when its all working.

Thanks again, Owen

plger commented 4 years ago

Thanks for the precisions; I sent you an invite by email (@ Dartmouth), let me know in case you didn't get it.

owenwilkins commented 4 years ago

Perfect, thanks so much. Have just responded and send the file via email. let me know if you have any issues receiving it that way.

plger commented 4 years ago

Hi @owenwilkins

So I'm not actually getting exactly the same error you are getting, but here are the things I had to address to make it work:

I think this might be only in the reduced object you sent me, but you're using cell barcodes as column names, and since you have many samples (i.e. captures) this means you have duplicated column names. This might lead into a bunch of trouble (and triggers errors for some muscat functions), so it's advisable to names the cells by something like 'sample.barcode'.
Again probably just related to the reduced object, but you have cells with 0 reads, and I had to remove them to make mmDS work
Your 'subject' variable is stored as a numeric; I'm not 100% sure how this is handled as a random effect, but in any other contexts this would be problematic because the model would assume an numeric relationship between samples, so this should be converted to factor.
You're using a recent version of variancePartition (1.16.1) which wasn't the one when the bioc version of muscat was developed, and this is unfortunately creating problems for mmDS. We've already addressed this, but I guess it will only be in Bioc at the next release in April. In the meantime, you can use our development version of muscat, with which you should be able to run the all mmDS methods without problems. You can install it using BiocManager::install("HelenaLC/muscat", ref="fix_sim").

Let me know if that fixes your problem. Pierre-Luc

owenwilkins commented 4 years ago

@plger , thanks very much for your quick response again!

Seems like I did more harm than good creating a reduced object that was easily sharable. The barcodes issue and the 0 reads are definitely caused by how I cut up the original dataset to make it smaller, so sorry about that.

I've installed the development version of muscat using the code you provided, and tried to run mm <- mmDS(sce3, method = "vst", vst = "sctransform", n_cells = 10) although I am getting an error: No cluster has at least 2 samples with at least 10 cells. This might be related to how I made the smaller object to share.

Would it be possible for you to share the code you used to make it work? Might be the easiest way of diagnosing if its something I did to the dataset when subsetting it.

Thanks again for all your help on this! Owen

owenwilkins commented 4 years ago

Also, not sure if there may be an issue with prepSCE in the development version. Each time I try to run:

sce <- prepSCE(sce, cluster_id = "celltype", sample_id = "subject", group_id = "sample") I get an error: unused arguments (cluster_id = "celltype", sample_id = "subject", group_id = "sample"), despite confirming than celltype, subject, and sample are all valid columns in sce@colData

HelenaLC commented 4 years ago

Re this last issue- these arguments have previously been changed to cluster_id -> kid, sample_id -> sid, group_id -> gid; not sure which version exactly you are running, but I'd recommend trying prepSCE(sce, kid = "celltype", sid = "subject", gid = "sample") and/or checking the function documentation.

owenwilkins commented 4 years ago

Ah perfect thanks, that solved that part of the issue

HelenaLC / muscat

mmDS error: gradient function must return a numeric vector of length 4 #24