apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Newly updated Makefile generate path errors #13859

Open lichen11 opened 5 years ago

lichen11 commented 5 years ago

I am following the steps (https://mxnet.incubator.apache.org/install/ubuntu_setup.html#install-the-mxnet-package-for-r) to update my mxnet R version to 1.3 on my Centos 7.6. I am able to finish steps 1-4, but for step 5 sudo make rpkg

I receive the following errors:

** testing if installed package can be loaded [1] "Loading local: inst/libs/libmxnet.so" Error: package or namespace load failed for ‘mxnet’: .onLoad failed in loadNamespace() for 'mxnet', details: call: dyn.load("R-package/inst/libs/libmxnet.so", local = FALSE) error: unable to load shared object '/home/username/incubator-mxnet/R-package/R-package/inst/libs/libmxnet.so': libmklml_intel.so: cannot open shared object file: No such file or directory Error: loading failed Execution halted ERROR: loading failed

I am executing the code in incubator-mxnet folder. For some reason, the path has two R-packages. I can identify libmxnet.so in /home/username/incubator-mxnet//R-package/inst/libs. Which file should I fix to edit the extra "R-package" path?

pengzhao-intel commented 5 years ago

@xinyu-intel please help take a look

TaoLv commented 5 years ago

@lichen11 Just to confirm are you using the latest master branch or 1.3.x branch/tags?

xinyu-intel commented 5 years ago

Hi @lichen11 , please try to add echo "USE_MKLDNN = 0" >> ./config.mk in step 4 to fix this error.

lichen11 commented 5 years ago

@TaoLv yes it is the latest master branch. @xinyu-intel I added the line. But before I could verify whether the previous error can be fix, I am encountering another error: Loading required package: devtools Rscript -e "if(!require(devtools)||packageVersion('roxygen2') < '6.1.1'){install.packages('roxygen2', repo = 'https://cloud.r-project.org/')}" Loading required package: devtools Error in packageVersion("roxygen2") : package ‘roxygen2’ not found Execution halted make: *** [rpkg] Error 1

I verified that my R has roxygen2 package installed with version 6.1.1. I am unsure why it is giving such error.

lanking520 commented 5 years ago

@mxnet-label-bot add [R, installation]

lichen11 commented 5 years ago

I changed the line Rscript -e "if(!require(devtools)||packageVersion('roxygen2') < '6.1.1'){install.packages('roxygen2', repo = 'https://cloud.r-project.org/')}" to Rscript -e " devtools::install_version('roxygen2',version='6.1.1',\ repos='https://cloud.r-project.org/',quiet=TRUE)"

I am able to perform Step 5 with no error. But when I load mxnet in R, I receive the following error:

Error: package or namespace load failed for ‘mxnet’: .onLoad failed in loadNamespace() for 'mxnet', details: call: dyn.load("R-package/inst/libs/libmxnet.so", local = FALSE) error: unable to load shared object '/home/username/R-package/inst/libs/libmxnet.so': /home/username/R-package/inst/libs/libmxnet.so: cannot open shared object file: No such file or directory

lichen11 commented 5 years ago

I fixed it using a very odd trick.... I noticed that when I run R in the terminal, I am able to load mxnet and train NNs. However when I use R studio server, it gives the above error. I noticed that in R studio server, the lib paths are .libPaths() [1] "/home/username/R/x86_64-redhat-linux-gnu-library/3.5" [2] "/usr/lib64/R/library"
[3] "/usr/share/R/library"

But in R terminal, the libPath() only contains "/usr/lib64/R/library" "/usr/share/R/library"

So I just removed "/home/username/R/x86_64-redhat-linux-gnu-library/3.5" as a lib path. But what I noticed is mxnet is installed in "/home/username/R/x86_64-redhat-linux-gnu-library/3.5", not the other two paths. It is very odd that I have to remove this lib path (where mxnet is installed) in order to get mxnet working.... You guys might want to look into this also.

hetong007 commented 5 years ago

@lichen11 instead of removing your .libPaths() , how about you install roxygen2 in your Rstudio server too?

lichen11 commented 5 years ago

Yes I installed roxygen2 on Rserver. I am still receiving the error

Loading required package: mxnet [1] "Loading local: inst/libs/libmxnet.so" Error: package or namespace load failed for ‘mxnet’: .onLoad failed in loadNamespace() for 'mxnet', details: call: dyn.load("R-package/inst/libs/libmxnet.so", local = FALSE) error: unable to load shared object '/home/user/R-package/inst/libs/libmxnet.so': /home/user/R-package/inst/libs/libmxnet.so: cannot open shared object file: No such file or directory

Changing .libPaths() works but I would like to know what the cause is and what I can do to fix the problem permanently.

hetong007 commented 5 years ago

@lichen11 It sounds like you are having two different R installations. They may partially share the .libPaths(), which may cause confusion.

Can you stick with one of them, remove mxnet installation (remove the folder) and re-install? If a direct installation doesn't work, remember to try the trick from @xinyu-intel .

lichen11 commented 5 years ago

I checked the R versions. I am using R 3.5.1 (2018-07-02) Feather Spray on both R and R server.

hetong007 commented 5 years ago

I'm saying that your R and R server are different R installation, but they could have shared .libPaths(). Therefore some packages dependencies are installed from different installations, and this has the potential to cause issues.

Please remove mxnet completely, and try to re-install mxnet in one of your installation.

lichen11 commented 5 years ago

I have a question on uninstalling mxnet. I am on a centos system. I tried remove.packages('mxnet') but mxnet is still there. I am also did sudo make clean in the incubator-mxnet folder, but I can still load mxnet. Online there is no specific documentation on how to uninstall mxnet R package.

hetong007 commented 5 years ago

Check your .libPaths() and remove mxnet folders in each of the paths.

lichen11 commented 5 years ago

Thanks. I removed the folders and installed new mxnet. I am following the example https://mxnet.incubator.apache.org/versions/master/tutorials/r/mnistCompetition.html but I trained with two GPUs using

model <- mx.model.FeedForward.create(lenet, X=train.array, y=train.y, ctx=list(mx.gpu(0), mx.gpu(1)), num.round=100, array.batch.size=100, learning.rate=0.05, momentum=0.9, wd=0.00001, eval.metric=mx.metric.accuracy, epoch.end.callback=mx.callback.log.train.metric(100))

I received the following error: Auto-select kvstore type = local_update_cpu Start training with 2 devices Error in kvstore$set.optimizer(optimizer) : kvstore.cc:124: RCheck failed: names.size() == 2 && names[0] == "create.state" && names[1] == "update" Invalid optimizer

I checked using either GPU, I am able to run the code without error. Does mxnet R automatically support multiple GPUs?

hetong007 commented 5 years ago

@anirudhacharya do you have references to best practice for multi-GPU training in R?

anirudhacharya commented 5 years ago

@hetong007 I do not have any best practice for multi-gpu training, but this might help @lichen11 's issue - https://github.com/apache/incubator-mxnet/issues/5296#issuecomment-461608335

chris-english commented 5 years ago

Double R-Package/Rpackage still makes cp have trouble finding: chris@jacie:~/.virtualenvs/mx_cv4/mxnet$ ls lib libiomp5.so libmkldnn.so.0 libmklml_intel.so libmxnet.a libmxnet.so

chris@jacie:~/.virtualenvs/mx_cv4/mxnet$ workon mx_cv4 (mx_cv4) chris@jacie:~/.virtualenvs/mx_cv4/mxnet$ python Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.

import mxnet mxnet.version '1.5.0'

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS Matrix products: default BLAS: /usr/local/lib64/R/lib/libRblas.so LAPACK: /usr/local/lib64/R/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.1 tools_3.5.1 yaml_2.2.0

  chris@jacie:~/.virtualenvs/mx_cv4/mxnet$ make rpkg Makefile:313: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages mkdir -p R-package/inst/libs cp src/io/image_recordio.h R-package/src cp -rf lib/libmxnet.so R-package/inst/libs mkdir -p R-package/inst/include cp -rf include/ R-package/inst/include rm R-package/inst/include/dmlc rm R-package/inst/include/nnvm cp -rf 3rdparty/dmlc-core/include/ R-package/inst/include/ cp -rf 3rdparty/tvm/nnvm/include/* R-package/inst/include Rscript -e "if(!require(devtools)){install.packages('devtools', repo = 'https://cloud.r-project.org/')}" Loading required package: devtools Rscript -e "library(devtools); library(methods); options(repos=c(CRAN='https://cloud.r-project.org/')); install_deps(pkg='R-package', dependencies = TRUE)"

cp R-package/dummy.NAMESPACE R-package/NAMESPACE echo "import(Rcpp)" >> R-package/NAMESPACE R CMD INSTALL R-package

also tried the echo "USE_MKLDNN = 0" >> ./config.mk but same results. What did the solution turn out to be?

jwmueller commented 5 years ago

I also have the same issue as @chris-english and @lichen11 (see detailed outputs below) and echo "USE_MKLDNN = 0" >> ./config.mk also does not help either.
The issue clearly seems to be the replicated "R-package/R-package/" in the path where libmxnet.so is supposed to be located, since this file is located at the right path if I just remove one of duplicated "R-package/" statements.

Does anyone have a stable solution? I need to programmatically download and install R-MXNet on some remote servers. I'm not exactly sure, but it could be that one solution is: dyn.load("R-package/inst/libs/libmxnet.so", local = FALSE) should instead be replaced by: dyn.load("inst/libs/libmxnet.so", local = FALSE)

Also, the fix merged at #13952 didn't seem to resolve this issue.

Detailed Commands & Output:

sudo make rpkg

Makefile:313: WARNING: Significant performance increases can be achieved by installing and enabling gperftools or jemalloc development packages mkdir -p R-package/inst/libs cp src/io/image_recordio.h R-package/src cp -rf lib/libmxnet.so R-package/inst/libs if [ -e "lib/libmkldnn.so.0" ]; then \ cp -rf lib/libmkldnn.so.0 R-package/inst/libs; \ cp -rf lib/libiomp5.so R-package/inst/libs; \ cp -rf lib/libmklml_intel.so R-package/inst/libs; \ fi mkdir -p R-package/inst/include cp -rl include/* R-package/inst/include Rscript -e "if(!require(devtools)){install.packages('devtools', repo = 'https://cloud.r-project.org/')}" Loading required package: devtools Rscript -e "if(!require(roxygen2)||packageVersion('roxygen2') < '6.1.1'){install.packages('roxygen2', repo = 'https://cloud.r-project.org/')}" Loading required package: roxygen2 Rscript -e "library(devtools); library(methods); options(repos=c(CRAN='https://cloud.r-project.org/')); install_deps(pkg='R-package', dependencies = TRUE)"

cp R-package/dummy.NAMESPACE R-package/NAMESPACE echo "import(Rcpp)" >> R-package/NAMESPACE R CMD INSTALL R-package

sessionInfo()

R version 3.4.4 (2018-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS

Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

Imshepherd commented 5 years ago

same with two R-package in path