Closed ziqilau closed 9 years ago
I got the same error on a different machine. It turns out the movielens dataset has many samples of all zeros - the first 14575 user entries are all zeros.
BIDMach takes the first minibatch of data (about 2000 samples on this dataset) and uses it to compute the density - i.e. non-zeros per column - for the dataset. That makes the assumption that hte data are stationary, which isnt true in this case. In fact it gets a batch of all zeros, and then tries to allocate matrices of zero size on the GPU. That illegal.
There are a few workarounds - the first is to simply discard the first 14574 columns:
val aa = a(?,14574->a.ncols)
and then process aa. A more mathematically kosher way to do it is to randomize the columns so the data really are stationary.
val aa = a(?,randperm(a.ncols));
and then train on aa. Strictly speaking you should always randomize the columns of a dataset. In practice it often doesnt matter. Try both ways to be sure.
Hi all,
1) I was using this AMI: http://tleyden.github.io/blog/2014/10/25/cuda-6-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/ for cuda environment on a aws gx2.2xlarge instance.
2) after download this bidmach bundle: http://bid2.berkeley.edu/bid-data-project/BIDMach_1.0.0-linux-x86_64.tar.gz, I scripts/getdata.sh download the data.
3) I stuck at nn.train method while training a sample data from movielens10M, see following:
Welcome to Scala version 2.11.2 (OpenJDK 64-Bit Server VM, Java 1.7.0_79). Type in expressions to have them evaluated. Type :help for more information.
scala> val a = loadSMat("data/movielens/train1.smat.lz4") a: BIDMat.SMat = ( 172, 14574) 4 ( 187, 14574) 4 ( 195, 14574) 4 ( 207, 14574) 4 ( 215, 14574) 4 ( 222, 14574) 3 ( 224, 14574) 3 ( 226, 14574) 3 ... ... ...
scala> val (nn, opts) = NMF.learner(a) nn: BIDMach.Learner = Learner(BIDMach.datasources.MatDS@36895c35,BIDMach.models.NMF@7404b78b,null,BIDMach.updaters.IncNorm@61ae422e,BIDMach.models.NMF$xopts$4@777b0c1b) opts: BIDMach.Learner.Options with BIDMach.models.NMF.Opts with BIDMach.datasources.MatDS.Opts with BIDMach.updaters.IncNorm.Opts = BIDMach.models.NMF$xopts$4@777b0c1b
scala> nn.train corpus perplexity=65134.014613 pass= 0 device is 0 java.lang.RuntimeException: Cuda error in GSMat() too many resources requested for launch at BIDMat.GSMat$.apply(GSMat.scala:325) at BIDMat.GSMat$.newOrCheckGSMat(GSMat.scala:480) at BIDMat.GSMat$.newOrCheckGSMat(GSMat.scala:528) at BIDMat.GSMat$.fromSMat(GSMat.scala:409) at BIDMat.GSMat$.apply(GSMat.scala:330) at BIDMach.models.Model$$anonfun$copyMats$1.apply$mcVI$sp(Model.scala:106) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at BIDMach.models.Model.copyMats(Model.scala:94) at BIDMach.models.Model.doblockg(Model.scala:73) at BIDMach.Learner.retrain(Learner.scala:83) at BIDMach.Learner.train(Learner.scala:49) ... 33 elided