drisso / archive-zinbwave

Archived version of the zinbwave package. All future development will be done on https://github.com/drisso/zinbwave
6 stars 0 forks source link

Error: 'vmmin' is not finite #46

Open davismcc opened 7 years ago

davismcc commented 7 years ago

Hi Davide

I'm trying to run zinbwave on a moderately large dataset:

R> se
 class: SummarizedExperiment
 dim: 2572 8818
 metadata(0):
 assays(1): counts
 rownames(2572): ENSG00000000419_DPM1 ENSG00000002330_BAD ... ENSG00000269028_MTRNR2L12
   ENSG00000269190_FBXO17
 rowData names(24): exprs_collapsed_to ensembl_transcript_id ... is_feature_control is_feature_spike
 colnames(8818): 19776_1#100 19776_1#102 ... 22607_8#98 22607_8#99
 colData names(146): salmon_version samp_type ... cell_filter_lenient size_factor

After running for around 13 hours, the call to zinbwave threw the error below:

 R> print(system.time(se <- zinbwave(se, K = 30, X = "~ experiment",
 +                                  residuals = TRUE,
 +                                  normalizedValues = TRUE)))
 Error in optim(fn = zinb.loglik.regression, gr = zinb.loglik.regression.gradient,  :
   initial value in 'vmmin' is not finite
 In addition: There were 32 warnings (use warnings() to see them)
 Timing stopped at: 5.158e+04 1442 5.303e+04

Any ideas what might be going awry here?

Session info and warnings below. Let me know if you'd like to take a look at the data yourself, and I can share offline.

Best Davis

Warnings:

R> warnings()
 Warning messages:
 1: In simpute.als(x, J, thresh, lambda, maxit, trace.it,  ... :
   Convergence not achieved by 100 iterations
 2: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 3: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 4: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 5: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 6: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 7: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 8: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 9: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 10: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 11: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 12: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 13: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 14: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 15: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 16: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 17: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 18: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 19: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 20: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 21: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 22: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 23: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 24: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 25: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 26: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 27: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 28: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 29: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 30: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 31: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value
 32: In optimize(f = zinb.loglik.dispersion, Y = Y, mu = mu,  ... :
   NA/Inf replaced by maximum positive value

Session Info

R> sessionInfo()
 R version 3.4.1 (2017-06-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

 Matrix products: default
 BLAS: /nfs/software/stegle/users/davis/conda-envs/r34/lib/R/lib/libRblas.so
 LAPACK: /nfs/software/stegle/users/davis/conda-envs/r34/lib/R/lib/libRlapack.so

 locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
  [1] doParallel_1.0.10          iterators_1.0.8            foreach_1.4.3              BiocParallel_1.10.1
  [5] ggjoy_0.3.0                future_1.6.0               cowplot_0.8.0              ggthemes_3.4.0
  [9] scater_1.4.0               ggplot2_2.2.1              zinbwave_0.99.6            scone_1.1.2
 [13] clusterExperiment_1.3.2    SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2
 [17] Biobase_2.36.2             GenomicRanges_1.28.4       GenomeInfoDb_1.12.2        IRanges_2.10.2
 [21] S4Vectors_0.14.3           BiocGenerics_0.22.0

 loaded via a namespace (and not attached):
   [1] shinydashboard_0.6.1     R.utils_2.5.0            htmlwidgets_0.9          RSQLite_2.0
   [5] AnnotationDbi_1.38.2     grid_3.4.1               trimcluster_0.1-2        RNeXML_2.0.7
   [9] DESeq_1.29.0             munsell_0.4.3            codetools_0.2-15         colorspace_1.3-2
  [13] energy_1.7-0             uuid_0.1-2               pspline_1.0-18           robustbase_0.92-7
  [17] bayesm_3.1-0.1           listenv_0.6.0            NMF_0.20.6               tximport_1.4.0
  [21] GenomeInfoDbData_0.99.0  rmote_0.3.4              hwriter_1.3.2            bit64_0.9-7
  [25] rhdf5_2.20.0             EDASeq_2.11.0            diptest_0.75-7           R6_2.2.2
  [29] ggbeeswarm_0.6.0         taxize_0.8.9             locfit_1.5-9.1           flexmix_2.3-14
  [33] bitops_1.0-6             reshape_0.8.7            assertthat_0.2.0         scales_0.4.1
  [37] nnet_7.3-12              beeswarm_0.2.3           gtable_0.2.0             phylobase_0.8.4
  [41] RUVSeq_1.11.0            globals_0.10.2           bold_0.5.0               rlang_0.1.2
  [45] genefilter_1.59.0        splines_3.4.1            rtracklayer_1.36.4       lazyeval_0.2.0
  [49] hexbin_1.27.1            reshape2_1.4.2           GenomicFeatures_1.28.4   httpuv_1.3.5
  [53] tensorA_0.36             tools_3.4.1              gridBase_0.4-7           gplots_3.0.1
  [57] RColorBrewer_1.1-2       servr_0.7                stabledist_0.7-1         Rcpp_0.12.12
  [61] plyr_1.8.4               progress_1.1.2           zlibbioc_1.22.0          purrr_0.2.3
  [65] RCurl_1.95-4.8           prettyunits_1.0.2        viridis_0.4.0            cluster_2.0.6
  [69] crul_0.3.8               magrittr_1.5             data.table_1.10.4        RSpectra_0.12-0
  [73] mvtnorm_1.0-6            whisker_0.3-2            gsl_1.9-10.3             aroma.light_3.7.0
  [77] mime_0.5                 xtable_1.8-2             XML_3.98-1.9             mclust_5.3
  [81] gridExtra_2.2.1          compiler_3.4.1           biomaRt_2.32.1           tibble_1.3.4
  [85] KernSmooth_2.23-15       R.oo_1.21.0              htmltools_0.3.6          segmented_0.5-2.1
  [89] pcaPP_1.9-72             tidyr_0.7.0              geneplotter_1.55.0       howmany_0.3-1
  [93] DBI_0.7                  MASS_7.3-47              fpc_2.1-10               boot_1.3-20
  [97] compositions_1.40-1      ShortRead_1.35.1         Matrix_1.2-11            ade4_1.7-8
 [101] R.methodsS3_1.7.1        gdata_2.18.0             bindr_0.1                pkgconfig_2.0.1
 [105] rncl_0.8.2               GenomicAlignments_1.12.2 registry_0.3             numDeriv_2016.8-1
 [109] locfdr_1.1-8             xml2_1.1.1               rARPACK_0.11-0           annotate_1.55.0
 [113] vipor_0.4.5              rngtools_1.2.4           pkgmaker_0.22            XVector_0.16.0
 [117] stringr_1.2.0            digest_0.6.12            copula_0.999-17          ADGofTest_0.3
 [121] softImpute_1.4           Biostrings_2.44.2        dendextend_1.5.2         edgeR_3.18.1
 [125] curl_2.8.1               kernlab_0.9-25           shiny_1.0.4              Rsamtools_1.28.0
 [129] gtools_3.5.0             modeltools_0.2-21        rjson_0.2.15             nlme_3.1-131
 [133] jsonlite_1.5             bindrcpp_0.2             viridisLite_0.2.0        limma_3.32.5
 [137] lattice_0.20-35          httr_1.3.1               DEoptimR_1.0-8           survival_2.41-3
 [141] glue_1.1.1               png_0.1-7                prabclus_2.2-6           glmnet_2.0-10
 [145] bit_1.1-12               class_7.3-14             stringi_1.1.5            mixtools_1.1.0
 [149] blob_1.1.0               latticeExtra_0.6-28      caTools_1.17.1           memoise_1.1.0
 [153] dplyr_0.7.2              ape_4.1
drisso commented 7 years ago

Hi @davismcc

This looks suspiciously similar to #36 . Do you have whole numbers in your count assay? If that's not the issue, it would be great if you can share a subset of your data for me to investigate further!

davismcc commented 7 years ago

Hi Davide

Indeed, I do not have whole numbers in the count assay. I have used estimated counts from Salmon. Apologies if I missed the stipulation for whole numbers in the docs/workflow.

Obviously I can round my counts for this analysis and I guess that will work OK? In general it might be preferable to allow estimated counts from Salmon/kallisto/etc as possible input? In edgeR, for example, estimated counts work fine - we encourage users to provide raw counts, but the algorithms work perfectly well with non-integer values.

I'll follow up with a subset of the data. What's the best way to send a private link to you?

Best Davis

drisso commented 7 years ago

Hi Davis,

yes, I see your point, and it would definitely be nice to be able to use non-integer values in zinbwave. However, this leads to nasty errors that I can't still figure out. For now, I've decided to add an error upfront so that at least the users don't wait 13 hours before it fails. This is implemented in the new version of the package (0.99.8).

I would recommend that for now you round your counts and it would be great if you let me know if that fixes the problem. I will try to figure out why the algorithm fails with non-integer values, so I'll keep this open.

BTW, because of the git transition of Bioconductor I had to archive this version of the git repo and all the new activity will be at https://github.com/drisso/zinbwave.

davismcc commented 7 years ago

Hi Davide

Error upfront is really sensible. Whatever you decide to/can implement for non-integers, best to know if there's a problem as early as possible.

I'm trying again now with rounded counts, so I'll let you know how that goes - fingers crossed!

I'll note the transition to the other repository - I have to do the same for scater.

Best D.