Allow for CUDA backend - Githubissues

cdeterman commented 6 years ago

It has occurred to me, during my toying with gpuRcuda that I should be able to find a way to simply allow the user to indicate they wish to use the CUDA backend. After all, ViennaCL is intended to make an interface for both OpenCL and CUDA. This was not pursued previously because Rcpp and the nvcc compiler were not playing nice together. This has changed very recently with some changes to Rcpp.

If I am able to success in this, then gpuRcuda will no longer be relevant. Instead, I would work to interface with relevant CUDA extensions like cublas with gpuRcublas directly from gpuR.

cdeterman commented 6 years ago

I believe I nearly have this for Linux builds (and theoretically MacOSX). However, I don't believe this will be possible for Windows given the requirement by NVIDIA's nvcc compiler demanding use of Visual Studio. This is not supported by R and causes all sorts of other problems. As such, until such a time the NVIDIA allows Windows OS to use the MinGW toolset the CUDA backend for gpuR will likely be limited to Linux systems.

cdeterman commented 6 years ago

@PengZhao @rhaunschild you two are users who have a noted interest in CUDA. Could you try to install the cuda branch of this repository to confirm if it compiles nicely for you? From R you should just need the following commands

Sys.setenv(BACKEND="CUDA")
devtools::install_github("cdeterman/gpuR", ref = "cuda")

If it works, I would encourage you to try and clone the repository branch and try to run the unit tests by opening an R session in the git directory and run

devtools::test()

dselivanov commented 6 years ago

Congrats @cdeterman! I've tried to install it but got this error:

In file included from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/meta/result_of.hpp:41:0, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/scalar.hpp:29, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/tools/entry_proxy.hpp:27, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/detail/matrix_def.hpp:26, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp:26, from ../inst/include/gpuR/dynVCLMat.hpp:26, from ../inst/include/gpuR/getVCLptr.hpp:5, from gpuMatrix_igemm.cpp:4: /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/Core:58:34: fatal error: math_functions.hpp: No such file or directory compilation terminated.

I have latest RcppEigen_0.3.3.4.0 installed.

Full traceback:

> * installing *source* package ‘gpuR’ ... checking for g++... g++ checking whether the C++ compiler works... yes checking for C++ compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking how to run the C++ preprocessor... g++ -E Checking for C++ Compiler checking whether we are using the GNU C++ compiler... (cached) yes checking whether g++ accepts -g... (cached) yes configure: "BACKEND = CUDA" checking "Checking environment variable CUDA_HOME"... "CUDA_HOME not set; using highest version found /usr/local/cuda-9.1" checking for /usr/local/cuda-9.1/bin/nvcc... yes "NVCC found" checking "whether this is the 64 bit linux version of CUDA"... checking for /usr/local/cuda-9.1/lib64/libcudart.so... yes "yes -- using /usr/local/cuda-9.1/lib64 for CUDA libs" checking for Rscript... yes checking "building the nvcc command line"... configure: "Acquiring R compiler flags" configure: Building Makevars configure: creating ./config.status config.status: creating src/Makevars ** libs /usr/local/cuda-9.1/bin/nvcc -gencode arch=compute_30,code=sm_30 -std=c++11 -DGPU -x cu -c -Xcompiler "-fPIC" -Xcudafe "--diag_suppress=boolean_controlling_expr_is_constant --diag_suppress=code_is_unreachable" --expt-relaxed-constexpr -I. -I../inst/include -DBACKEND_CUDA -I/usr/share/R/include -I/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include -I"/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include" -I/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include -I/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/BH/include gpuMatrix_igemm.cpp -o gpuMatrix_igemm.o In file included from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/meta/result_of.hpp:41:0, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/scalar.hpp:29, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/tools/entry_proxy.hpp:27, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/detail/matrix_def.hpp:26, from /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp:26, from ../inst/include/gpuR/dynVCLMat.hpp:26, from ../inst/include/gpuR/getVCLptr.hpp:5, from gpuMatrix_igemm.cpp:4: /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/Core:58:34: fatal error: math_functions.hpp: No such file or directory compilation terminated. Makevars:40: recipe for target 'gpuMatrix_igemm.o' failed make: *** [gpuMatrix_igemm.o] Error 1 ERROR: compilation failed for package ‘gpuR’ >* removing ‘/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/gpuR’ Warning message: In i.p(...) : installation of package ‘/tmp/RtmpjNapzd/remotes4fd42f6b7122/cdeterman-gpuR-5b29276’ had non-zero exit status

EDIT here is SO post after quick googling

cdeterman commented 6 years ago

Thanks @dselivanov what version of RcppEigen do you have installed?

dselivanov commented 6 years ago

Latest RcppEigen_0.3.3.4.0 After simlinking (as was suggested on SO) with

sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp

I got another portion of errors:

/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(96): error: identifier "x" is undefined /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(138): error: identifier "x" is undefined /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(138): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(223): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(223): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(276): error: class "__half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(372): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(378): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(387): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(387): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(571): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(571): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(603): error: class "Eigen::half" has no member "x" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/Half.h(614): warning: function "__shfl_xor(float, int, int)" /usr/local/cuda-9.1/bin/../targets/x86_64-linux/include/sm_30_intrinsics.hpp(295): here was declared deprecated ("__shfl_xor() is deprecated in favor of __shfl_xor_sync() and may be removed in a future release (Use -Wno-deprecated-declarations to suppress this warning).") /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include/Eigen/src/Core/arch/CUDA/PacketMathHalf.h(102): error: more than one conversion function from "const __half" to a built-in type applies: function "__half::operator short() const" function "__half::operator unsigned short() const" function "__half::operator int() const" function "__half::operator unsigned int() const" function "__half::operator long long() const" function "__half::operator unsigned long long() const" function "__half::operator __nv_bool() const" 14 errors detected in the compilation of "/tmp/tmpxft_00005a44_00000000-6_gpuMatrix_igemm.cpp1.ii".

cdeterman commented 6 years ago

Hmm... odd, it continues to work with my CUDA 8.0 on ubuntu 14.04 docker image. I will try with CUDA 9.1 (as I see that is your version) on ubuntu 16.04 and see what happens.

dselivanov commented 6 years ago

It seems it is indeed related to cuda 9.1 https://github.com/tensorflow/tensorflow/issues/15389

znmeb commented 6 years ago

I recently acquired a laptop with an NVidia 1050Ti. It's currently running Windows 10 Pro with Hyper-V, Docker for Windows and Windows Subsystem for Linux. I have CUDA 9.0 without Visual Studio. I haven't tried anything involving nvcc yet though; TensorFlow 1.5.0 runs fine on it.

I am doing a lot of work with Docker at the moment but have no immediate plans to dual-boot the machine. I think Hyper-V can see the GPU and export it to a guest VM; I'll try this in a day or so,

I will probably end up with both Visual Studio 2015 and 2017; the Microsoft R Client uses 2017 and the NVidia FORTRAN compiler uses 2015.

cdeterman commented 6 years ago

@dselivanov I can confirm this same problem on my docker image. I have found a way to resolve the initial problem (so you don't need to symlink) but I am looking in to why the half errors are happening which again appear to be an issue between Eigen and CUDA >= 9.

cdeterman commented 6 years ago

@dselivanov I have created forks of the BH and RcppEigen package that I have updated using the most recent changes in the sources (i.e. boostorg and Eigen) to support CUDA >= 9. Please try installing directly from my github with

devtools::install_github('cdeterman/BH')
devtools::install_github('cdeterman/RcppEigen')

and try to install gpuR again. I have successfully compiled in my docker image (ubuntu 16.04, cuda 9.1).

dselivanov commented 6 years ago

I got another error:

/usr/local/cuda-9.1/bin/nvcc -gencode arch=compute_30,code=sm_30 -std=c++11 -DGPU -x cu -c -Xcompiler "-fPIC" -Xcudafe "--diag_suppress=boolean_controlling_expr_is_constant --diag_suppress=code_is_unreachable" --expt-relaxed-constexpr -I. -I../inst/include -DBACKEND_CUDA -I/usr/share/R/include -I/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include -I"/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RcppEigen/include" -I/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include -I/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/BH/include solve.cpp -o solve.o /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/traits/size.hpp(164): error: class "Eigen::Map, 0, Eigen::OuterStride<-1>>" has no member "size1" detected during: instantiation of "viennacl::vcl_size_t viennacl::traits::size1(const MatrixType &) [with MatrixType=Eigen::Map, 0, Eigen::OuterStride<-1>>]" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp(1178): here instantiation of "void viennacl::copy(const viennacl::matrix &, CPUMatrixT &) [with CPUMatrixT=Eigen::Map, 0, Eigen::OuterStride<-1>>, NumericT=float, F=viennacl::row_major, AlignmentV=1U]" ../inst/include/gpuR/dynEigenMat.hpp(428): here instantiation of "void dynEigenMat::value, void>::type>::to_host(viennacl::matrix &) [with T=float]" solve.cpp(64): here instantiation of "void cpp_gpuMatrix_solve(SEXP, SEXP, __nv_bool, __nv_bool, int) [with T=float]" solve.cpp(103): here /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/meta/result_of.hpp(142): error: class "Eigen::Map, 0, Eigen::OuterStride<-1>>" has no member "size_type" detected during: instantiation of class "viennacl::result_of::size_type [with T=Eigen::Map, 0, Eigen::OuterStride<-1>>]" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp(1186): here instantiation of "void viennacl::copy(const viennacl::matrix &, CPUMatrixT &) [with CPUMatrixT=Eigen::Map, 0, Eigen::OuterStride<-1>>, NumericT=float, F=viennacl::row_major, AlignmentV=1U]" ../inst/include/gpuR/dynEigenMat.hpp(428): here instantiation of "void dynEigenMat::value, void>::type>::to_host(viennacl::matrix &) [with T=float]" solve.cpp(64): here instantiation of "void cpp_gpuMatrix_solve(SEXP, SEXP, __nv_bool, __nv_bool, int) [with T=float]" solve.cpp(103): here /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/traits/size.hpp(202): error: class "Eigen::Map, 0, Eigen::OuterStride<-1>>" has no member "size2" detected during: instantiation of "viennacl::result_of::size_type::type viennacl::traits::size2(const MatrixType &) [with MatrixType=Eigen::Map, 0, Eigen::OuterStride<-1>>]" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp(1186): here instantiation of "void viennacl::copy(const viennacl::matrix &, CPUMatrixT &) [with CPUMatrixT=Eigen::Map, 0, Eigen::OuterStride<-1>>, NumericT=float, F=viennacl::row_major, AlignmentV=1U]" ../inst/include/gpuR/dynEigenMat.hpp(428): here instantiation of "void dynEigenMat::value, void>::type>::to_host(viennacl::matrix &) [with T=float]" solve.cpp(64): here instantiation of "void cpp_gpuMatrix_solve(SEXP, SEXP, __nv_bool, __nv_bool, int) [with T=float]" solve.cpp(103): here /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/traits/size.hpp(164): error: class "Eigen::Map, 0, Eigen::OuterStride<-1>>" has no member "size1" detected during: instantiation of "viennacl::vcl_size_t viennacl::traits::size1(const MatrixType &) [with MatrixType=Eigen::Map, 0, Eigen::OuterStride<-1>>]" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp(1178): here instantiation of "void viennacl::copy(const viennacl::matrix &, CPUMatrixT &) [with CPUMatrixT=Eigen::Map, 0, Eigen::OuterStride<-1>>, NumericT=double, F=viennacl::row_major, AlignmentV=1U]" ../inst/include/gpuR/dynEigenMat.hpp(428): here instantiation of "void dynEigenMat::value, void>::type>::to_host(viennacl::matrix &) [with T=double]" solve.cpp(64): here instantiation of "void cpp_gpuMatrix_solve(SEXP, SEXP, __nv_bool, __nv_bool, int) [with T=double]" solve.cpp(106): here /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/meta/result_of.hpp(142): error: class "Eigen::Map, 0, Eigen::OuterStride<-1>>" has no member "size_type" detected during: instantiation of class "viennacl::result_of::size_type [with T=Eigen::Map, 0, Eigen::OuterStride<-1>>]" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp(1186): here instantiation of "void viennacl::copy(const viennacl::matrix &, CPUMatrixT &) [with CPUMatrixT=Eigen::Map, 0, Eigen::OuterStride<-1>>, NumericT=double, F=viennacl::row_major, AlignmentV=1U]" ../inst/include/gpuR/dynEigenMat.hpp(428): here instantiation of "void dynEigenMat::value, void>::type>::to_host(viennacl::matrix &) [with T=double]" solve.cpp(64): here instantiation of "void cpp_gpuMatrix_solve(SEXP, SEXP, __nv_bool, __nv_bool, int) [with T=double]" solve.cpp(106): here /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/traits/size.hpp(202): error: class "Eigen::Map, 0, Eigen::OuterStride<-1>>" has no member "size2" detected during: instantiation of "viennacl::result_of::size_type::type viennacl::traits::size2(const MatrixType &) [with MatrixType=Eigen::Map, 0, Eigen::OuterStride<-1>>]" /home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/RViennaCL/include/viennacl/matrix.hpp(1186): here instantiation of "void viennacl::copy(const viennacl::matrix &, CPUMatrixT &) [with CPUMatrixT=Eigen::Map, 0, Eigen::OuterStride<-1>>, NumericT=double, F=viennacl::row_major, AlignmentV=1U]" ../inst/include/gpuR/dynEigenMat.hpp(428): here instantiation of "void dynEigenMat::value, void>::type>::to_host(viennacl::matrix &) [with T=double]" solve.cpp(64): here instantiation of "void cpp_gpuMatrix_solve(SEXP, SEXP, __nv_bool, __nv_bool, int) [with T=double]" solve.cpp(106): here 6 errors detected in the compilation of "/tmp/tmpxft_000077eb_00000000-6_solve.cpp1.ii". Makevars:40: recipe for target 'solve.o' failed make: *** [solve.o] Error 1 ERROR: compilation failed for package ‘gpuR’ * removing ‘/home/dselivanov/R/x86_64-pc-linux-gnu-library/3.4/gpuR’ Warning message: In i.p(...) : installation of package ‘/tmp/RtmpPlThUs/remotes71dc19c963d7/cdeterman-gpuR-5b29276’ had non-zero exit status

cdeterman commented 6 years ago

@dselivanov Bah..., I forgot as well that you need my github version of RViennaCL. I am waiting on some pull requests with ViennaCL before I release the updates.

devtools::install_github('cdeterman/RViennaCL')

Then try once more.

dselivanov commented 6 years ago

Successfully installed (and packages loads normally)!

library(gpuR)
#  - context device index: 0
#    - GeForce GTX 680
#checked all devices
#completed initialization
#gpuR 2.0.2
#Attaching package: ‘gpuR’
#The following objects are masked from ‘package:base’:

#    colnames, pmax, pmin, svd

However devtools::test() prints:

Error in dyn.load(dllfile) : unable to load shared object '/home/dselivanov/projects/gpuR/src/gpuR.so': /home/dselivanov/projects/gpuR/src/gpuR.so: undefined symbol: cudaMemcpyAsync

mjmg commented 6 years ago

Is there an easier way to switch between OpenCL and CUDA backends?

Right now it seems that you have to uninstall/reinstall/rebuild packages between

Sys.setenv(BACKEND="CUDA")
devtools::install_github("cdeterman/gpuR", ref = "cuda")

and regular install.packages

install.packages("gpuR")

It would be nice if there could be one build for both platforms linked to both OpenCL and CUDA libraries and switch context using some flags.

cdeterman commented 6 years ago

@mjmg I would ideally prefer to have the ability to switch between both backends but I don't think it is possible with ViennaCL. @karlrupp is it possible to use both OpenCL and CUDA concurrently with ViennaCL? If so, perhaps something could be done here.

karlrupp commented 6 years ago

yes, it is absolutely possible to switch between CUDA and OpenCL backends in ViennaCL at runtime. That's one of the strengths of ViennaCL over other libraries.

cdeterman commented 6 years ago

@karlrupp are there any examples of this? For example, how could I create two matrices one in OpenCL and one in CUDA?

jonpeake commented 5 years ago

@cdeterman I tried installing cuda-backed gpuR but can't get any of the cuda functions to work. For example, if I try creating a vclMatrix, I receive the error

Error in vectorToMatVCL(data, nrow, ncol, 8L, context_index - 1) : 
  /home/jonpeake/R/x86_64-pc-linux-gnu-library/3.5/RViennaCL/include/viennacl/linalg/cuda/matrix_operations.hpp(334): : getLastCudaError() CUDA error 48: no kernel image is available for execution on the device @ matrix_row_assign_kernel

Same thing happens if I try to multiply two gpuMatrix objects:

Error in cpp_gpuMatrix_elem_prod(A@address, is(A, "vclMatrix"), B@address,  : 
  /home/jonpeake/R/x86_64-pc-linux-gnu-library/3.5/RViennaCL/include/viennacl/linalg/cuda/matrix_operations.hpp(334): : getLastCudaError() CUDA error 48: no kernel image is available for execution on the device @ matrix_row_assign_kernel

My GPU info is below, running CUDA 10.0 on Ubuntu 18.04:

> gpuInfo()
$deviceName
[1] "GeForce GTX 960"

$deviceVendor
[1] "NVIDIA"

$majorVersion
[1] 5

$minorVersion
[1] 2

$numberOfMultiProcs
[1] 8

$sharedMemPerBlock
[1] 49152

$regsPerBlock
[1] 65536

$warpSize
[1] 32

$deviceMemory
[1] 4236902400

$deviceConstMemory
[1] 65536

$clockFreq
[1] 1253000

$double_support
[1] TRUE

Any idea what's going on?

jonpeake commented 5 years ago

@cdeterman Never mind! Figured out that RStudio doesn't automatically import system environment variables if opened from the desktop, I needed to open from command line for it to import.

cdeterman / gpuR

Allow for CUDA backend #110