Closed ceolium closed 7 years ago
Can you show your log when compiling mxnet?
ps -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda-8.0/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMSHADOW_USE_CUDNN=1 -I/root/mxnet/cub -DMXNET_USE_NVRTC=0 -MMD -c src/operator/convolution_v1.cc -o build/src/operator/convolution_v1.o
g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -I/root/mxnet/mshadow/ -I/root/mxnet/dmlc-core/include -fPIC -I/root/mxnet/nnvm/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda-8.0/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=1 -I/usr/include/opencv -fopenmp -DMSHADOW_USE_CUDNN=1 -I/root/mxnet/cub -DMXNET_USE_NVRTC=0 -MMD -c src/operator/correlation.cc -o build/src/operator/correlation.o
g++ -std=c++11 -c -DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -I/root/mxnet/mshadow/ -I/root/mxnet/dmlc-core/include -fPIC -I/root/mxnet/nnvm/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda-8.0/include -DMSHADOW_USE_CBL
::op::ActivationParam) [with DType=mshadow::half::half_t]"
src/operator/activation.cu(27): here
src/operator/./cudnn_activation-inl.h(137): warning: variable "beta" was declared but never referenced
detected during:
instantiation of "void mxnet::op::CuDNNActivationOp<DType>::Backward(const mxnet::OpContext &, const std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob>> &, const std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob>> &, const std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob>> &, const std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType>> &, const std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob>> &, const std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob>> &) [with DType=mshadow::half::half_t]"
(44): here
src/operator/./cudnn_convolution-inl.h(286): error: too few arguments in function call
1 error detected in the compilation of "/tmp/tmpxft_00007035_00000000-5_convolution_v1.cpp4.ii".
make: *** [build/src/operator/convolution_v1_gpu.o] Error 2
Can you tell me which AWS instance you used?
It's ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-20160627 (ami-2d39803a) [type g2.8xlarge]
It seems like the code is using gpu resource because when I put 'nvidia-smi' to server, it says I am using it. But when I ran it on other properly working ubuntu server (14.04 LTS also) shows messages like below: used (Mb) gc trigger (Mb) max used (Mb) Ncells 1459016 78.0 2637877 140.9 1459016 78.0 Vcells 14991815 114.4 22282032 170.0 14991815 114.4 [1] TRUE [1] TRUE [1] TRUE [1] TRUE [[11:31:3111:31:31] ] /home/mining/mxnet/dmlc-core/include/dmlc/logging.h/home/mining/mxnet/dmlc-core/include/dmlc/logging.h::235235: : [11:31:31] src/operator/./convolution-inl.h:370: Check failed: ksizey <= dshape[2] + 2 * param.pad[0] && ksizex <= dshape[3] + 2 * param.pad[1] kernel size exceed input[11:31:31] src/operator/./convolution-inl.h:370: Check failed: ksizey <= dshape[2] + 2 * param.pad[0] && ksizex <= dshape[3] + 2 * param.pad[1] kernel size exceed input
I am closing this since it has been inactive for quite a while. Feel free to reopen if necessary.
Environment info
Operating System: Amazon ec2 ubuntu 14.04LTS
Compiler: gcc
Package used (Python/R/Scala/Julia): R
MXNet version:
Or if installed from source:
MXNet commit hash (
git rev-parse HEAD
):If you are using python package, please provide
Python version and distribution:
If you are using R package, please provide
R
sessionInfo()
: 3.3.3 RCI am trying to install MXNet R version on Amazon Web Service EC2 (ubuntu 14.04LTS) by following the instruction: http://mxnet.io/get_started/ubuntu_setup.html.
First I downloaded CUDA 8toolkit from nvidia.
Then downloaded the latest cudnn file(cudnn-8.0-linux-x64-v6.0.tgz) and transfer it to ec2 instance by scp.
In ec2 console (accessed by SSH), I typed
Then I install mxnet source file from git, made config.mk file, and modified the config.mk to USE_CUDA=1, and so on (for GPU usage). Moved to set-utils directory and compiled ubuntu r version shell script.
FYI, I checked the nvidia driver is installed properly by 'nvidia-smi' command.
I launched R and punched,
library(mxnet)
Then the output wasRcpp Init>
I ran some test code for mxnet and it worked fine.So I proceed to run GPU using code (Lenet):
This is a basic tutorial code from mxnet page.
But I got following error messages:
I want to make sure:
I modified the config.mk file 'before' I actaully compile by 'bash install--mxnet-ubuntu-r.sh' command. Changed enviornment variables as many ways as possible. Repeated above steps at least 7 times. My final goal is to run a code which contains mxnet lenet by batch file(R CMD BATCH ~.R) I would be very appreciated if someone can actually solve my problem.