Closed thomasmooon closed 6 years ago
@thomasmooon maybe you can test whether #11374 effectively solves this RAM comsumtionissue?
@thomasmooon I just ran your example with num_filter = 32
and no workspace parameter and model ran properly on a single 1060, staying aroung stable around 2.7 Go Ram on the GPU.
@nswamy Can you close this issue?
thanks @jeremiedb
@jeremiedb I was on vacancy leave and just read your posts. Thanks for your suggestion. But in the meanwhile, a few weeks after I opened the issue, I switched to another DL framework for several reasons.
@thomasmooon Sure I understand as the support for R-package hasn't been great. May I ask you if there were other specific features you were seeing as lacking? Thanks!
@jeremiedb Well, in general my experience is that a better documentation is desirable. Especially with minimum reproducible runnable examples for R for each layer / method. Hence, if I would restart with MXNet I'd first learn python and then use the MXNet Python. This doesn't answer your "specific feature" question, there were / are a lot of small things in my use cases demanding hacking a lot around using MXNet whilst in my framework of current choice this is not the case. Special hallmarks of MXNet, like a relatively high speed are valuable in general of course, but not that critical in my case.
Description
I have a toy dataset of 360 samples with 4096 data points each, leading to a tensor of shape
(4096,1,360)
. Hence, each observation has a size of ~ 4 kB. The CNN is very simple:Conv -> flatten -> fully connected -> fully connected -> softmax
:The VRAM consumption explodes in dependence of the number of filters: Please see the table and the related picture below. Regarding the influence of the kernel size and the batch size: These have very small influence, I've tested several combinations, but I omit these details for now. The tables measure a setting using 2 GPUs of my environment (described in the environment setting below). As one can see the VRAM demand of each card increases, as expected, linear with the number of convolution filters. But if it exceeds 10, then the GPUs run out of their 8 GB VRAM. What the hell...?
It is also remarkable, that in a setting with 1 GPU and 8 kernels is not possible: It exhausts the 8 GB RAM of the single Card. But using 2 GPUs with everything else unchanged, then each GPU consumes only 0.477 GB, so 2x0.477 = 0.95 GB in total. This is far beyond of what is consumed when using only 1 Card. How can this be??
Things else tested without any effect: The argument
workspace
in themx.symbol.Convolution()
-Function. I played with several values: 1, 64, 128, 512 MB. But his had absolutely none effect disregarding to any combination of varying number of filters. Here's the defintion ofworkspace
:In addition I measured the RAM consumption if the device is CPU, hence no usage of GPUs. I tried values of 10, 11 and 20 filters. What you can see is, that the RAM consumption increases linear, especially when increasing vom 10 to 11, rather than exploding if the device are GPUs. This is confusing. In addition, the RAM consumption using 10 filters is 9 GB, in alignment with the observation, that the VRAM of 8 GB of one GPU is insufficient. But, again, in contradiction to the 0.95 GB if 2 GPUs are used.
For R user, please provide R
sessionInfo()
:R version 3.4.3 (2017-11-30) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux
Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] bindrcpp_0.2 mxnet_0.10.1
loaded via a namespace (and not attached): [1] Rcpp_0.12.12 compiler_3.4.3 RColorBrewer_1.1-2 influenceR_0.1.0
[5] plyr_1.8.4 bindr_0.1 viridis_0.4.0 tools_3.4.3
[9] digest_0.6.12 jsonlite_1.5 tibble_1.3.3 gtable_0.2.0
[13] viridisLite_0.2.0 rgexf_0.15.3 pkgconfig_2.0.1 rlang_0.1.1
[17] igraph_1.1.2 rstudioapi_0.6 yaml_2.1.14 gridExtra_2.2.1
[21] DiagrammeR_0.9.0 dplyr_0.7.2 stringr_1.2.0 htmlwidgets_0.9
[25] grid_3.4.3 glue_1.1.1 R6_2.2.2 Rook_1.1-1
[29] XML_3.98-1.9 ggplot2_2.2.1 magrittr_1.5 codetools_0.2-15
[33] scales_0.4.1 htmltools_0.3.6 assertthat_0.2.0 colorspace_1.3-2
[37] brew_1.0-6 stringi_1.1.5 visNetwork_2.0.0 lazyeval_0.2.0
[41] munsell_0.4.3
Hardware
8 x 1080 TI 60 GB RAM 12 Cores
cuda version
Minimum reproducible example
Steps to reproduce
Comment / uncomment the lines in the section
and use
nvidia-smi -l 3
to monitor memory consumption. I recommend to run script not in R, rather from shell for convenience (R will crash when VRAM exceeds).To measure the RAM consumption using CPU, comment content in this section and monitor e. g. with
htop
What have you tried to solve it?
Varied these parameters:
workspace
: 1, 64, 128, 512, 1024 MB