MarioniLab / FurtherMNN2018

Code for further development of the mutual nearest neighbours batch correction method, as implemented in the batchelor package.
22 stars 6 forks source link

error when gene dimension is high : stop("infinite or missing values in 'x') #11

Closed MengQiuchen closed 5 years ago

MengQiuchen commented 5 years ago

when gene dimension is more than 400, mnnCorrect function report error:

  1. MnnSim2D(xmus, xsds, ymus, ysds, prop1, prop2, nsd = 1, ncells = ncells, . ngenes = ngenes, nsgRNAs = nsgRNAs, seed = 0, Noise = TRUE, . Plot = FALSE)
  2. mnnCorrect(B1, B2, k = 20, sigma = 1, cos.norm.in = FALSE, cos.norm.out = FALSE, . var.adj = TRUE, compute.angle = TRUE) # at line 85-86 of file
  3. find.shared.subspace(ref.basis, t(correction.in[i, , drop = FALSE]))
  4. pracma::orth(B)
  5. svd(M)
  6. stop("infinite or missing values in 'x'")

Thank you very much.

LTLA commented 5 years ago

There's a lot of gaps in your bug report:

Do read this. Otherwise I can only guess as to the causes.

MengQiuchen commented 5 years ago

Sorry, I didn't make it clear. I followed the code in [https://github.com/MarioniLab/MNN2017/tree/master/Simulations] to simulate figure2 in MNN paper. It worked well. But when I changed ngenes in simulateBatches.R from 100 to more than 400. Others are the same. I got this error. I also checked two batches matrix, there's no Na in the input.

LTLA commented 5 years ago

The following works for me.

source("simulateBatches.R") # modified with ngenes=400

load("Sim.RData")
hard <- mnnCorrect(B1, B2i, k=20, sigma=1, cos.norm.in=FALSE, cos.norm.out=FALSE, var.adj=TRUE)
easy <- mnnCorrect(B1, B2ii, k=20, sigma=1, cos.norm.in=FALSE, cos.norm.out=FALSE, var.adj=TRUE)

Note that mnnCorrect will move from scran to batchelor, hence the deprecation warnings.

Session information ``` R version 3.6.0 Patched (2019-05-02 r76458) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS Matrix products: default BLAS: /home/luna/Software/R/R-3-6-branch/lib/libRblas.so LAPACK: /home/luna/Software/R/R-3-6-branch/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] scran_1.12.1 SingleCellExperiment_1.6.0 [3] SummarizedExperiment_1.14.0 DelayedArray_0.10.0 [5] BiocParallel_1.18.0 matrixStats_0.54.0 [7] Biobase_2.44.0 GenomicRanges_1.36.0 [9] GenomeInfoDb_1.20.0 IRanges_2.18.0 [11] S4Vectors_0.22.0 BiocGenerics_0.30.0 loaded via a namespace (and not attached): [1] statmod_1.4.32 beeswarm_0.2.3 locfit_1.5-9.1 [4] tidyselect_0.2.5 BiocSingular_1.0.0 purrr_0.3.2 [7] lattice_0.20-38 colorspace_1.4-1 viridisLite_0.3.0 [10] rlang_0.3.4 pillar_1.4.1 glue_1.3.1 [13] dqrng_0.2.1 GenomeInfoDbData_1.2.1 plyr_1.8.4 [16] zlibbioc_1.30.0 munsell_0.5.0 gtable_0.3.0 [19] rsvd_1.0.0 vipor_0.4.5 irlba_2.3.3 [22] scater_1.12.2 BiocNeighbors_1.2.0 Rcpp_1.0.1 [25] edgeR_3.26.4 scales_1.0.0 limma_3.40.2 [28] XVector_0.24.0 gridExtra_2.3 ggplot2_3.1.1 [31] dplyr_0.8.1 grid_3.6.0 tools_3.6.0 [34] bitops_1.0-6 magrittr_1.5 RCurl_1.95-4.12 [37] lazyeval_0.2.2 tibble_2.1.2 dynamicTreeCut_1.63-1 [40] crayon_1.3.4 pkgconfig_2.0.2 Matrix_1.2-17 [43] DelayedMatrixStats_1.6.0 ggbeeswarm_0.6.0 assertthat_0.2.1 [46] viridis_0.5.1 R6_2.4.0 igraph_1.2.4.1 [49] compiler_3.6.0 ```
MengQiuchen commented 5 years ago

I checked Session Information and rewrote the code entirely the same. Problem solved. But the corrected space has many Na. For example, sum(is.na(easy$corrected[[2]])) equals to 3600 when ngenes=400. I find that if we don't add random noise to Batch 2, or add noise with small sd, the numbers of Na will decrease to 0.

LTLA commented 5 years ago

This is caused by a double-precision underflow when performing the Gaussian smoothing. It has now been fixed in the batchelor package (version 1.1.5, BioC-devel - not Bioc-release!). You can also get it from https://github.com/LTLA/batchelor. Keep in mind the output type of batchelor::mnnCorrect is different from scran::mnnCorrect, so you will have to update your scripts for that.

The cost of the fix is that the smoothing is much slower. But better slow than broken, and besides, people should be using fastMNN() rather than mnnCorrect().

LTLA commented 5 years ago

I'm going to assume that the latest version fixed this problem, unless told otherwise.