broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
556 stars 164 forks source link

runing inferCNV occurred error in large matrix #251

Closed honghh2018 closed 2 years ago

honghh2018 commented 4 years ago

Hi @all, Appreciated the great tools to study the scRNA CNV. I am running the inferCNV to explore the ScRNA CNV in big matrix with 106685 on columns and 35599 on rows, however, when its program reach 21denoise steps with erasing time 7 days triggerred below error,
Error in rasterImage( as.raster (zc), min(x), min(y), max(x), max(y), interpolate = FALSE) : Unable to allocate memory block with size 16777216 TB Calls: two
steps_ inferCNV ... heatmap.cnv -> image -> image.default -> rasterImage Stop execution. So, how can i do for this error? Any advice would be grateful. Best, hanhuihong

GeorgescuC commented 4 years ago

Hi @honghh2018 ,

What operating system are you using? How much memory do you have in total and is R allowed to use? Which version of infercnv are you using? You could try adding the option useRaster=FALSE, I am not sure it would help, but it worth a try because it looks like an error I encountered when doing tests for plot scaled to the chromosome size, where R would request inexplicably large amounts of memory on some of the cases, seemingly at random. The run should pick up at the denoise step, so it should be much faster to test than the full run, however with no raster optimizations, it will be slower than usual.

Regards, Christophe.

honghh2018 commented 4 years ago

Thanks @GeorgescuC , I would be try again the parameter useRaster=FALSE on my inferCNV, and hope help. I runing on centos 7.3 with 1 TB memory. the R version was 3.6.0. R detail showing below, CLICOLOR_FORCE 1 DISPLAY :0 EDITOR vi GIT_ASKPASS rpostback-askpass HOME /home/honghh LANG zh_CN.UTF-8 LD_LIBRARY_PATH /usr/lib64/R/lib::/lib:/usr/lib/jvm/jre/lib/amd64/server:/usr/lib/jvm/jre/lib/amd64:/usr/lib/jvm/java/lib/amd64:/usr/java/packages/lib/amd64:/lib:/usr/lib LN_S ln -s LOGNAME honghh MAKE make PAGER /usr/bin/less PATH /home/honghh/.local/share/r-miniconda/envs/r-reticulate/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lib/rstudio-server/bin/rpostback:/usr/lib/rstudio-server/bin/postback R_BROWSER /usr/bin/xdg-open R_BZIPCMD /usr/bin/bzip2 R_DOC_DIR /usr/share/doc/R-3.6.0 R_GZIPCMD /usr/bin/gzip R_HOME /usr/lib64/R R_INCLUDE_DIR /usr/include/R R_LIBS_SITE /usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib64/R/library:/usr/share/R/library R_LIBS_USER ~/R/x86_64-redhat-linux-gnu-library/3.6 R_PAPERSIZE a4 R_PDFVIEWER /usr/bin/xdg-open R_PLATFORM x86_64-redhat-linux-gnu R_PRINTCMD lpr R_RD4PDF times,hyper R_SESSION_INITIALIZED PID=310366:NAME="reticulate" R_SESSION_TMPDIR /tmp/RtmplMWShZ R_SHARE_DIR /usr/share/R R_STRIP_SHARED_LIB strip --strip-unneeded R_STRIP_STATIC_LIB strip --strip-debug R_SYSTEM_ABI linux,gcc,gxx,gfortran,gfortran R_TEXI2DVICMD /usr/bin/texi2dvi R_UNZIPCMD /usr/bin/unzip R_ZIPCMD /usr/bin/zip RCPP_PARALLEL_NUM_THREADS 1 RMARKDOWN_MATHJAX_PATH /usr/lib/rstudio-server/resources/mathjax-26 RS_RPOSTBACK_PATH /usr/lib/rstudio-server/bin/rpostback RSTUDIO 1 RSTUDIO_CONSOLE_COLOR 256 RSTUDIO_CONSOLE_WIDTH 107 RSTUDIO_HTTP_REFERER http://192.168.3.33:8787/ RSTUDIO_PANDOC /usr/lib/rstudio-server/bin/pandoc RSTUDIO_SESSION_STREAM honghh-d RSTUDIO_USER_IDENTITY honghh RSTUDIO_WINUTILS bin/winutils SED /usr/bin/sed SSH_ASKPASS rpostback-askpass TAR /usr/bin/gtar TERM xterm-256color USER honghh

honghh2018 commented 4 years ago

Additionally @GeorgescuC how can i get the R allowed memory?

GeorgescuC commented 4 years ago

Hi @honghh2018 ,

If you do not have job restrictions when you run R, it should allow for as much memory as running "ulimit -a" in a shell will show, with "unlimited" meaning all physically available.

The issue here is that for some unknown reason, R requests way more memory than it should ever need, and probably few computers in the world have that much (16777216 TB), if any. When I was testing the scaled plots, I was increasing the size of the matrix to be displayed to have more cells so that every heatmap dot would be a fixed number of base pairs. The issue you have would pop up on some tests with smaller final matrices (so more base pairs per dot) to be displayed but not on bigger ones (with less base pairs per dot), both starting with the same input matrix.

Regards, Christophe.

honghh2018 commented 4 years ago

Thanks @GeorgescuC , It seem to me that the memory requirred error happen was random? or bug in R. and there was a question about the runing time, on my case, i runing this program erasing 7 day and night with no HMM module. so can any advice can be done, and hope can add parallel threads on it. Regards, hanhui

GeorgescuC commented 4 years ago

Hi @honghh2018 ,

What are the options you are currently using? If you are not using the HMM, having the the analysis_mode set to subclusters would slow down the process significantly for example (especially with the default partition method of random_trees). You can save some time and memory on reading the input matrix by first running scripts/prepare_sparsematrix.R on it, which will make an R object with the matrix in sparse format using a more efficient method for reading files.

Parallelization exists, but only for the subclustering by random trees and the Bayesian filtering at this time.

Regards, Christophe.