BIDData / BIDMach

CPU and GPU-accelerated Machine Learning Library
BSD 3-Clause "New" or "Revised" License
916 stars 168 forks source link

"CUDA alloc failed initialization error" when calling mm.train the 2nd time #9

Closed zygmuntz closed 10 years ago

zygmuntz commented 10 years ago

I go through the quickstart example on Windows 7. When I try to call mm.train the second time, I get the following error. I need to exit bidmach and run it anew to be able to train again.

scala> mm.train corpus perplexity=5582,125391 pass= 0 2,00%, ll=-0,693, gf=0,116, secs=6,7, GB=0,02, MB/s= 2,86, GPUmem=0,03 16,00%, ll=-0,134, gf=0,630, secs=15,0, GB=0,12, MB/s= 8,10, GPUmem=0,03 30,00%, ll=-0,123, gf=0,825, secs=21,9, GB=0,22, MB/s=10,16, GPUmem=0,02 44,00%, ll=-0,102, gf=0,930, secs=28,7, GB=0,33, MB/s=11,31, GPUmem=0,02 58,00%, ll=-0,094, gf=0,995, secs=35,6, GB=0,43, MB/s=12,04, GPUmem=0,02 72,00%, ll=-0,074, gf=1,040, secs=42,4, GB=0,53, MB/s=12,49, GPUmem=0,02 87,00%, ll=-0,085, gf=1,075, secs=49,1, GB=0,63, MB/s=12,89, GPUmem=0,02 100,00%, ll=-0,069, gf=1,097, secs=55,8, GB=0,73, MB/s=13,02, GPUmem=0,02 Time=55,8000 secs, gflops=1,10

scala> mm.train corpus perplexity=5582,125391 java.lang.RuntimeException: CUDA alloc failed initialization error at BIDMat.GMat$.apply(GMat.scala:1094) at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1780) at BIDMat.GMat$.newOrCheckGMat(GMat.scala:1814) at BIDMat.GMat$.apply(GMat.scala:1100) at BIDMach.models.RegressionModel.init(Regression.scala:29) at BIDMach.models.GLM.init(GLM.scala:25) at BIDMach.Learner.init(Learner.scala:37) at BIDMach.Learner.train(Learner.scala:45) at .(:26) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)

    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)

    at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
    at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
    at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:8

05) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717) at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581) at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588) at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILo op.scala:882) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scal a:837) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scal a:837) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass Loader.scala:135) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837) at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala :83) at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)

    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
jcanny commented 10 years ago

The Learner reports the GPU memory at the end of each line. Its showing just 3% left after the first pass. So its definitely going to run out of memory on the second pass. The GPU has no native GC and allocation is so expensive its probably not practical to use one. Instead we use a cache, which you can clear manually.

Type:

resetGPU; Mat.clearCaches to clear the cache and the GPUs allocator

We may automate this inside the Learner in the next release. The downside is that it will clear any other arrays already residing in the GPU's memory.