Closed fmg-kevin-kilroy closed 8 years ago
Its very hard to support older versions, and predict functionality was defninitely poor back then.
You should be able to compile the current code under cuda 6.5. We support 6.5 for Tegra TK1 compatibility. The main change is to install the jcuda libraries for JCUDA 6.5. in BIDMach/lib, and use the bidmach65 script to start BIDMach. You'll need to recompile the binaries fro your platofrm
cd BIDMach/jni/src ./configure make installcudalib
then go back to BIDMach/ and do ./sbt package
Thanks I'll give it a go.
Almost working, this is from latest code cloned from master....
First, I had to remove the following from Makefile.inc after running jni/src/.configure
-gencode arch=compute_52,code=sm_52
Then make installcudalib succeeded
Loading /Users/kevinkilroy/software/biddata/BIDMach/lib/bidmach_init.scala...
import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, GND, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{Click, FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest, SVD}
import BIDMach.networks.Net
import BIDMach.datasources.{DataSource, MatSource, FileSource, SFileSource}
import BIDMach.datasinks.{DataSink, MatSink}
import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer}
import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, Grad, IncMult, IncNorm, Telescoping}
import BIDMach.causal.IPTW
2 CUDA devices found, CUDA version 6.5
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77).
Type in expressions to have them evaluated.
Type :help for more information.
.....
However when training....
scala> mm.train
corpus perplexity=2.069916
java.lang.RuntimeException: CUDA kernel error 8 in CUMAT.applyop
at BIDMat.GMat.gOp(GMat.scala:849)
at BIDMat.GMat.unary_$minus(GMat.scala:1267)
at BIDMat.GMat.unary_$minus(GMat.scala:19)
at BIDMach.models.GLM.init(GLM.scala:115)
at BIDMach.Learner.init(Learner.scala:62)
at BIDMach.Learner.firstPass(Learner.scala:93)
at BIDMach.Learner.retrain(Learner.scala:81)
at BIDMach.Learner.train(Learner.scala:70)
... 33 elided
It looks like my GPU(s) are working properly:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "GeForce 9600M GT"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 256 MBytes (268107776 bytes)
( 4) Multiprocessors, ( 8) CUDA Cores/MP: 32 CUDA Cores
GPU Clock rate: 1250 MHz (1.25 GHz)
Memory Clock rate: 792 Mhz
Memory Bus Width: 128-bit
Maximum Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per multiprocessor: 768
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "GeForce 9400M"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 254 MBytes (265945088 bytes)
( 2) Multiprocessors, ( 8) CUDA Cores/MP: 16 CUDA Cores
GPU Clock rate: 1100 MHz (1.10 GHz)
Memory Clock rate: 1064 Mhz
Memory Bus Width: 128-bit
Maximum Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per multiprocessor: 768
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: No with 0 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = GeForce 9600M GT, Device1 = GeForce 9400M
Result = PASS
I then tried to add the following into Makefile.incl to support my GPU architecture: ` -gencode arch=compute_11,code=sm_11
` But get the following Make error:
nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
ptxas error : Entry function '_Z10__treeWalkPfPiS_S0_S0_S0_S_iiiiiii' uses too much shared data (0x8064 bytes, 0x4000 max)
ptxas error : Entry function '_Z10__treePackIffEvPT_PiPT0_PxS2_iiiii' uses too much shared data (0x80cc bytes, 0x4000 max)
ptxas error : Entry function '_Z10__treePackIiiEvPT_PiPT0_PxS2_iiiii' uses too much shared data (0x80cc bytes, 0x4000 max)
ptxas error : Entry function '_Z10__treePackIfiEvPT_PiPT0_PxS2_iiiii' uses too much shared data (0x80cc bytes, 0x4000 max)
make: *** [Dtree.o] Error 255
Suspecting now that my GPU is just too ancient
I will close this issue as using 1.0.3 I am getting prediction values back, albeit not via GPU but that's a separate issue due to my old GPU.
Hi, I'm trying to run a simple Linear Regression model but the results of nn.predict are always 0
My matrices/code look like this:
Please note, I'm using 0.97 version as my GPU only supports upto CUDA 6.5.
I appreciate the dataset is small, but I'm trying to reproduce the Linear Regression model from an online course that I'm taking (Andrew Ng Machine Learning coursera course). I've tried increasing the number of passes but it makes no difference
Any help would be appreciated.
Thanks,