BIDData / BIDMach

CPU and GPU-accelerated Machine Learning Library
BSD 3-Clause "New" or "Revised" License
916 stars 168 forks source link

CUDA alloc failed initialization error with kmeans algorithm #5

Closed mwiewior closed 10 years ago

mwiewior commented 10 years ago

Hi All, I'm getting the following error when I'm trying to run clustering with kmeans with GPU. The problem occurs in the following use cases: 1)input 1M records/10attributes/k=100/10iterations 2)input 30M records/10attributes/k=10/10iterations but works for: 1)input 10M/10attributes/k=10/10iterations

You reported running kmeans with 100M dataset with gtx680 gpu that according to the specifications has 2GB of RAM so I think it should also work in my case - gtx860m/2GB or am I missing something?

Besides - do you know why my card is being reported as : 1 CUDA device found, CUDA version 5.5 while I'm running CUDA 6.0?

Regards, Marek

java.lang.RuntimeException: CUDA alloc failed initialization error at BIDMat.GMat$.apply(GMat.scala:1094) at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1780) at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1814) at BIDMat.GMat$.apply(GMat.scala:1100) at BIDMach.models.ClusteringModel.init(Clustering.scala:21) at BIDMach.models.KMeans.init(KMeans.scala:34) at BIDMach.Learner.init(Learner.scala:37) at BIDMach.Learner.train(Learner.scala:45) at .(:26) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568) at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717) at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581) at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588) at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837) at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83) at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96) at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105) at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

jcanny commented 10 years ago

This problem is very small and shouldnt cause any memory problems. I'll hazard a guess and say your input was transposed compared to what BIDMach expects. BIDMach follows Matlab/Fortran in using column-major order. That means that each input instance should be a column and the rows should index features. So your 1M instance x 10 feature problem should be input as a 10 x 1M matrix. This is the opposite of python (and R I think).

If you input a 1M x 10 matrix with k=100, then KMeans will see 1M features and 100 centers and build a 1M x 100 dim matrix (half a gigabyte). When you add the derivatives etc. you will quickly run out of memory. Infact its happening during initialization which is a good clue that the model matrix is too big.

re: CUDA 5.5, the "full bundle" of BIDMach includes the CUDA 5.5 runtime on Linux. That's what it will be using by default, and what it should report when you start.

Your card doesnt have a CUDA version, and its possible to have multiple versions of CUDA software installed on the same system. It depends which of these you run/link against.

Let me know if the transpose fixed your problem so I can close this issue.

-John

mwiewior commented 10 years ago

Thanks a lot! Transposing did the trick - actually I didn't check the results - just was focused on performance and that's why I didn't spot that you follow column-major order. You can close the ticket-thank you once again!