BIDData / BIDMat

A CPU and GPU-accelerated matrix library for data mining
BSD 3-Clause "New" or "Revised" License
265 stars 73 forks source link

Memory issues with Java Runtime object ("memory gets allocated randomly") #32

Closed DanielTakeshi closed 9 years ago

DanielTakeshi commented 9 years ago

John,

I've been having some problems debugging memory allocation with BIDMach (BayesNet.scala specifically). I'm using the Java Runtime class which should be a reliable measure of memory allocation. But in the following test script, I'm noticing some weird results:

  1 import java.text.NumberFormat
  2 
  3 def computeMemory = {
  4     val runtime = Runtime.getRuntime()
  5     val format = NumberFormat.getInstance()
  6     val sb = new StringBuilder()
  7     val maxMemory = runtime.maxMemory()
  8     val allocatedMemory = runtime.totalMemory()
  9     val freeMemory = runtime.freeMemory()
 10     sb.append("free memory: " + format.format(freeMemory / (1024*1024)) + "M   ");
 11     sb.append("allocated/total memory: " + format.format(allocatedMemory / (1024*1024)) + "M\n");
 12     print(sb.toString())
 13 }
 14 
 15 for (i <- 0 until 100) {
 16     val a = rand(67,4367)
 17     //Thread sleep 3000
 18     println("memory at iteration i = " + i)
 19     computeMemory
 20 }

The computeMemory function computes the free memory and the total memory. The free memory we set with -Xms. Then the loop creates a bunch of random matrices and prints out memory as we go.

The output is as follows:

dhcp-46-165:BIDMach danielseita$ ./bidmach test.ssc 
Loading /Users/danielseita/BIDMach/lib/bidmach_init.scala...
import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest}
import BIDMach.networks.DNN
import BIDMach.datasources.{DataSource, MatDS, FilesDS, SFilesDS}
import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer}
import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, IncMult, IncNorm, Telescoping}
import BIDMach.causal.IPTW
1 CUDA device found, CUDA version 7.0

Loading test.ssc...
import java.text.NumberFormat
computeMemory: Unit
memory at iteration i = 0
free memory: 13,534M   allocated/total memory: 13,739M
memory at iteration i = 1
free memory: 13,534M   allocated/total memory: 13,739M
memory at iteration i = 2
free memory: 13,462M   allocated/total memory: 13,739M
memory at iteration i = 3
free memory: 13,462M   allocated/total memory: 13,739M

// More of the same...

free memory: 13,462M   allocated/total memory: 13,739M
memory at iteration i = 66
free memory: 13,462M   allocated/total memory: 13,739M
memory at iteration i = 67
free memory: 13,389M   allocated/total memory: 13,739M
memory at iteration i = 68
free memory: 13,389M   allocated/total memory: 13,739M
memory at iteration i = 69

// More of the same ...

What happens is that it looks like the amount of free memory goes down at seemingly random times (about 72-73 MB), but in fact, when you add up the amount of memory allocated across all 100 loops for these 67 x 4367, non-cached matrices (which is approximately (67 x 4367 x 8) / (1024 x 1024) MB if we assume one element takes up about 8 bytes) the final free memory makes sense. What happens is that the free memory variable does not seem to get updated frequently enough.

So my question: do you know if there is some esoteric detail about memory allocation with BIDMach/BIDMat data structures that would cause memory allocation to behave weirdly and not update with the Java Runtime? Normally this wouldn't be too big of a problem, but I'm trying to debug matrix caching problems with Gibbs sampling. For that, it would be nice to confidently point out places in the code where memory gets allocated, other than having "random" drops of memory occur. Adding a thread sleep option to delay measurement time does not seem to affect this.

It's likely that this is more of a Java Runtime problem or some timing issue between Runtime and Scala, so I'll probably try printing out GUIDs of matrices to help me debug instead, but I just wanted to check. This happens both on my laptop and in stout.

jcanny commented 9 years ago

Hi Daniel, Debugging caching is easiest with the following steps:

  1. run the algorithm CPU only, with caching off. If the algorithm has a GPU-oinly kernele, create a "harness" around the GPU kernel that copies the input matrices from CPU to GPU compute the results, and then copies back, and finally frees teh GPU matrices.
  2. Then run on CPU with caching on. If there is a difference in results, then there is a caching problem, otherwise something else is going on. You can debug it with eclipse.

That will catch problems in caching generally. In case its a GPU-specific problem then:

  1. Optional: I just did this: add an option in Leaner that turns on memory debugging (Mat.debugMem=true) on the second pass over a dataset. The reason for this is that caching should completely allocate all matrices in the first pass, and any large matrix allocated after that is almost surely a caching error. Mat.debugMem = true will causes an exception to be thrown at the location of the problem, which should be easy to fix. The option is debugMem in Learner.Options.

Java has a garbage collector so free memory will go up and down randomly when the GC runs.

-John

On 5/26/2015 11:21 AM, Daniel Seita wrote:

John,

I've been having some problems debugging memory allocation with BIDMach (BayesNet.scala specifically). I'm using the Java Runtime class which should be a reliable measure of memory allocation. But in the following test script, I'm noticing some weird results:

1 import java.text.NumberFormat 2 3 def computeMemory = { 4 val runtime = Runtime.getRuntime() 5 val format = NumberFormat.getInstance() 6 val sb = new StringBuilder() 7 val maxMemory = runtime.maxMemory() 8 val allocatedMemory = runtime.totalMemory() 9 val freeMemory = runtime.freeMemory() 10 sb.append("free memory: " + format.format(freeMemory / (1024_1024)) + "M "); 11 sb.append("allocated/total memory: " + format.format(allocatedMemory / (1024_1024)) + "M\n"); 12 print(sb.toString()) 13 } 14 15 for (i <- 0 until 100) { 16 val a = rand(67,4367) 17 //Thread sleep 3000 18 println("memory at iteration i = " + i) 19 computeMemory 20 }

The computeMemory function computes the free memory and the total memory. The free memory we set with -Xms. Then the loop creates a bunch of random matrices and prints out memory as we go.

The output is as follows:

|dhcp-46-165:BIDMach danielseita$ ./bidmach test.ssc Loading /Users/danielseita/BIDMach/lib/bidmachinit.scala... import BIDMat.{CMat, CSMat, DMat, Dict, FMat, FND, GMat, GDMat, GIMat, GLMat, GSMat, GSDMat, HMat, IDict, Image, IMat, LMat, Mat, SMat, SBMat, SDMat} import BIDMat.MatFunctions. import BIDMat.SciFunctions. import BIDMat.Solvers. import BIDMat.Plotting._ import BIDMach.Learner import BIDMach.models.{FM, GLM, KMeans, KMeansw, LDA, LDAgibbs, Model, NMF, SFA, RandomForest} import BIDMach.networks.DNN import BIDMach.datasources.{DataSource, MatDS, FilesDS, SFilesDS} import BIDMach.mixins.{CosineSim, Perplexity, Top, L1Regularizer, L2Regularizer} import BIDMach.updaters.{ADAGrad, Batch, BatchNorm, IncMult, IncNorm, Telescoping} import BIDMach.causal.IPTW 1 CUDA device found, CUDA version 7.0

Loading test.ssc... import java.text.NumberFormat computeMemory: Unit memory at iteration i = 0 free memory: 13,534M allocated/total memory: 13,739M memory at iteration i = 1 free memory: 13,534M allocated/total memory: 13,739M memory at iteration i = 2 free memory: 13,462M allocated/total memory: 13,739M memory at iteration i = 3 free memory: 13,462M allocated/total memory: 13,739M

// More of the same...

free memory: 13,462M allocated/total memory: 13,739M memory at iteration i = 66 free memory: 13,462M allocated/total memory: 13,739M memory at iteration i = 67 free memory: 13,389M allocated/total memory: 13,739M memory at iteration i = 68 free memory: 13,389M allocated/total memory: 13,739M memory at iteration i = 69

// More of the same ... |

What happens is that it looks like the amount of free memory goes down at /seemingly random/ times (about 72-73 MB), but in fact, when you add up the amount of memory allocated across all 100 loops for these 67 x 4367, non-cached matrices (which is approximately (67 x 4367 x 8) / (1024 x 1024) MB if we assume one element takes up about 8 bytes) the /final free memory/ makes sense. What happens is that the free memory variable does not seem to get updated frequently enough.

So my question: do you know if there is some esoteric detail about memory allocation with BIDMach/BIDMat data structures that would cause memory allocation to behave weirdly and not update with the Java Runtime? Normally this wouldn't be /too big/ of a problem, but I'm trying to debug matrix caching problems with Gibbs sampling. For that, it would be nice to confidently point out places in the code where memory gets allocated, other than having "random" drops of memory occur. Adding a thread sleep option to delay measurement time does not seem to affect this.

It's likely that this is more of a Java Runtime problem or some timing issue between Runtime and Scala, so I'll probably try printing out GUIDs of matrices to help me debug instead, but I just wanted to check. This happens both on my laptop and in stout.

— Reply to this email directly or view it on GitHub https://github.com/BIDData/BIDMat/issues/32.

DanielTakeshi commented 9 years ago

Thanks for the debugging suggestions. I will try these out, especially the third point. I'll close this issue for now (and hopefully it'll be useful as a future reference).

Just to clarify, I set up Java to use 14gigs of memory to start (using -Xms14G) so the garbage collector should not (based on my knowledge) run until we get close to having allocated that amount of memory.

jcanny commented 9 years ago

On 5/26/2015 12:29 PM, Daniel Seita wrote:

Thanks for the debugging suggestions. I will try these out, especially the third point. I'll close this issue for now (and hopefully it'll be useful as a future reference).

Just to clarify, I set up Java to use 14gigs of memory to start (using -Xms14G) so the garbage collector should not (based on my knowledge) run until we get close to having allocated that amount of memory.

Well, I'm not sure about that. The GC is an extremely complicated beast. But anyway, I wouldnt use the JVM free memory directly to analyze BIDMach memory use.

-John

— Reply to this email directly or view it on GitHub https://github.com/BIDData/BIDMat/issues/32#issuecomment-105641930.