nn.predict always returns 0

fmg-kevin-kilroy commented 8 years ago

Hi, I'm trying to run a simple Linear Regression model but the results of nn.predict are always 0

Option Name       Type          Value
===========       ====          =====
addConstFeat      boolean       false
autoReset         boolean       true
batchSize         int           3
dim               int           256
doubleScore       boolean       false
epsilon           float         1.0E-5
evalStep          int           11
featThreshold     Mat           null
featType          int           1
initsumsq         float         1.0E-5
iweight           FMat          null
links             IMat          0
lrate             FMat          1
mask              FMat          null
npasses           int           2
nzPerColumn       int           0
pstep             float         0.01
putBack           int           -1
r1nmats           int           1
reg1weight        FMat          1.0000e-07
resFile           String        null
rmask             FMat          null
sample            float         1.0
sizeMargin        float         3.0
startBlock        int           8000
targets           FMat          null
targmap           FMat          null
texp              FMat          0.50000
useGPU            boolean       true
vexp              FMat          0.50000
waitsteps         int           2
()
corpus perplexity=1.000000
pass= 0
14.00%, ll=-112.88580, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.02
29.00%, ll=-39.68376, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.01
44.00%, ll=-208.15828, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.02
59.00%, ll=-41.27533, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.02
74.00%, ll=-19.18407, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.02
89.00%, ll=-16.87099, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.03
100.00%, ll=-120.24606, gf=0.000, secs=0.0, GB=0.00, MB/s= 0.02
Time=0.0350 secs, gflops=0.00
corpus perplexity=1.000000
Predicting

My matrices/code look like this:

    var X_train = data(0->trainSize,0)
    X_train = X_train.t // 2x67
    val y_train = data(0->trainSize,1).t // 1 x 67
    var X_test = data(trainSize->data.nr,0)
    X_test = X_test.t // 2 x 30
    val preds = zeros(1, X_test.nc) // 1 x 30
    val (mm:Learner, mopts:LearnOptions) = GLM.learner(X_train, y_train, GLM.linear)
    mm.train

    val (nn:Learner, nopts:LearnOptions) = GLM.predictor(mm.model, X_test, preds, 0)
    nn.predict

Please note, I'm using 0.97 version as my GPU only supports upto CUDA 6.5.

I appreciate the dataset is small, but I'm trying to reproduce the Linear Regression model from an online course that I'm taking (Andrew Ng Machine Learning coursera course). I've tried increasing the number of passes but it makes no difference

Any help would be appreciated.

Thanks,

jcanny commented 8 years ago

Its very hard to support older versions, and predict functionality was defninitely poor back then.

You should be able to compile the current code under cuda 6.5. We support 6.5 for Tegra TK1 compatibility. The main change is to install the jcuda libraries for JCUDA 6.5. in BIDMach/lib, and use the bidmach65 script to start BIDMach. You'll need to recompile the binaries fro your platofrm

cd BIDMach/jni/src ./configure make installcudalib

then go back to BIDMach/ and do ./sbt package

fmg-kevin-kilroy commented 8 years ago

Thanks I'll give it a go.

fmg-kevin-kilroy commented 8 years ago

Almost working, this is from latest code cloned from master.... First, I had to remove the following from Makefile.inc after running jni/src/.configure -gencode arch=compute_52,code=sm_52 Then make installcudalib succeeded

Loading /Users/kevinkilroy/software/biddata/BIDMach/lib/bidmach_init.scala...
import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, GND, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{Click, FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest, SVD}
import BIDMach.networks.Net
import BIDMach.datasources.{DataSource, MatSource, FileSource, SFileSource}
import BIDMach.datasinks.{DataSink, MatSink}
import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer}
import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, Grad, IncMult, IncNorm, Telescoping}
import BIDMach.causal.IPTW
2 CUDA devices found, CUDA version 6.5

Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77).
Type in expressions to have them evaluated.
Type :help for more information.

.....

However when training....

scala> mm.train
corpus perplexity=2.069916
java.lang.RuntimeException: CUDA kernel error 8 in CUMAT.applyop
  at BIDMat.GMat.gOp(GMat.scala:849)
  at BIDMat.GMat.unary_$minus(GMat.scala:1267)
  at BIDMat.GMat.unary_$minus(GMat.scala:19)
  at BIDMach.models.GLM.init(GLM.scala:115)
  at BIDMach.Learner.init(Learner.scala:62)
  at BIDMach.Learner.firstPass(Learner.scala:93)
  at BIDMach.Learner.retrain(Learner.scala:81)
  at BIDMach.Learner.train(Learner.scala:70)
  ... 33 elided

It looks like my GPU(s) are working properly:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "GeForce 9600M GT"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 256 MBytes (268107776 bytes)
  ( 4) Multiprocessors, (  8) CUDA Cores/MP:     32 CUDA Cores
  GPU Clock rate:                                1250 MHz (1.25 GHz)
  Memory Clock rate:                             792 Mhz
  Memory Bus Width:                              128-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "GeForce 9400M"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 254 MBytes (265945088 bytes)
  ( 2) Multiprocessors, (  8) CUDA Cores/MP:     16 CUDA Cores
  GPU Clock rate:                                1100 MHz (1.10 GHz)
  Memory Clock rate:                             1064 Mhz
  Memory Bus Width:                              128-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          No with 0 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = GeForce 9600M GT, Device1 = GeForce 9400M
Result = PASS

I then tried to add the following into Makefile.incl to support my GPU architecture: ` -gencode arch=compute_11,code=sm_11

` But get the following Make error:

nvcc warning : The 'compute_11', 'compute_12', 'compute_13', 'sm_11', 'sm_12', and 'sm_13' architectures are deprecated, and may be removed in a future release.
ptxas error   : Entry function '_Z10__treeWalkPfPiS_S0_S0_S0_S_iiiiiii' uses too much shared data (0x8064 bytes, 0x4000 max)
ptxas error   : Entry function '_Z10__treePackIffEvPT_PiPT0_PxS2_iiiii' uses too much shared data (0x80cc bytes, 0x4000 max)
ptxas error   : Entry function '_Z10__treePackIiiEvPT_PiPT0_PxS2_iiiii' uses too much shared data (0x80cc bytes, 0x4000 max)
ptxas error   : Entry function '_Z10__treePackIfiEvPT_PiPT0_PxS2_iiiii' uses too much shared data (0x80cc bytes, 0x4000 max)
make: *** [Dtree.o] Error 255

Suspecting now that my GPU is just too ancient

fmg-kevin-kilroy commented 8 years ago

I will close this issue as using 1.0.3 I am getting prediction values back, albeit not via GPU but that's a separate issue due to my old GPU.

BIDData / BIDMach

nn.predict always returns 0 #109