Closed copeg closed 6 years ago
What happens if you enable workspaces?
Adding either
.trainingWorkspaceMode(WorkspaceMode.SINGLE)
or
.trainingWorkspaceMode(WorkspaceMode.SEPARATE)
results in the same behavior
At the moment OOM errors go away if I manually set the OMP_NUM_THREADS environment variable to a low value (eg 1). Larger values, or not setting at all, result in application memory usage that just climbs up until OOM error is thrown. There is an added benefit in that setting this to 1 results in several fold improvement in efficiency relative to not setting at all...there is a downside in that this is not an ideal fix for deployment in some or our settings (client side application, where this Env variable will need to be set on each client - no way to do so via Java AFAIK)
that makes no sense.
what's your os and openmp implementation?
пт, 22 сент. 2017 г. в 19:47, Greg notifications@github.com:
At the moment OOM errors go away if I manually set the OMP_NUM_THREADS environment variable to a low value (eg 1). Larger values, or not setting at all, result in application memory usage that just climbs up until OOM error is thrown. There is an added benefit in that setting this to 1 results in several fold improvement in efficiency relative to not setting at all...there is a downside in that this is not an ideal fix for deployment in some or our settings (client side application, where this Env variable will need to be set on each client - no way to do so via Java AFAIK)
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/deeplearning4j/deeplearning4j/issues/4098#issuecomment-331499678, or mute the thread https://github.com/notifications/unsubscribe-auth/ALru_wDk3mdf71dpQIdPktMzsCN7vEkCks5sk-SHgaJpZM4Pfq9I .
OS and specs are in original post, copied here: Windows 7, 64-bit, 40Gb RAM, Intel Xeon CPU E5-2620 @2.0GHz (2x) OpenMP...forgive me because I'm a bit in the dark here. I'm just adding the deeplearning4j and nd4j dependencies to my pom and running through Eclipse.
Hi, same happens here with my app.
I have 36 features with binary output classification problem
training source files have 250_000 lines of double values
public static MultiLayerConfiguration getClassificationMLPConf(double learningRate, int inputNum, int outputNum, int[] hiddenLayers) {
ListBuilder lb = new NeuralNetConfiguration.Builder().seed(123)
.learningRate(learningRate)
.iterations(1)
.updater(Updater.NESTEROVS)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.list();
lb.layer(0, new DenseLayer.Builder()
.nIn(inputNum)
.nOut(hiddenLayers[0])
.activation(Activation.RELU)
.weightInit(WeightInit.XAVIER)
.build());
for(int i = 0; i < hiddenLayers.length - 1; i++) {
int nIn = hiddenLayers[i];
int nOut = hiddenLayers[i + 1];
int layerInt = i + 1;
lb.layer(layerInt, new DenseLayer.Builder()
.nIn(nIn)
.nOut(nOut)
.activation(Activation.RELU)
.weightInit(WeightInit.XAVIER)
.build());
}
lb.layer(hiddenLayers.length, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.nIn(hiddenLayers[hiddenLayers.length - 1])
.nOut(outputNum)
.activation(Activation.SOFTMAX)
.weightInit(WeightInit.XAVIER)
.build());
return lb.backprop(true).pretrain(false).build();
}
With invoker Dl4jUtils.getClassificationMLPConf(0.01, 36, 2, new int[] {75, 75, 75})
when loops the fitting process 600 times in a newly created thread, the memory consumption will continue increasing... until out of memory...
My machine configurations:
Hi, I also confirm that when running fitting processes in main thread, the memory consumption is quite normal, took about 3GB in my situation.
Here below is my POM:
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>0.9.1</version>
</dependency>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-core</artifactId>
<version>0.9.1</version>
</dependency>
So, in other words: everything works fine in single thread, and goes mad in parallel fitting threads IF openmp is used.
I'll try to profile it.
Hi, Just tried using these two configurations:
.inferenceWorkspaceMode(WorkspaceMode.
.trainingWorkspaceMode(WorkspaceMode.SEPARATE)
but this doesn't solve my problem of oom.
Can you confirm my words in previous comment?
I wonder, if "Java single thread + multiple OpenMP threads" work fine, and "Java multuple threads + multiple OpenMP threads" cause oom.
I don't know much about OpenMP and the implementation behind DL4j.
In my situation:
for(int i = 0; i < epocNum; i++) {
theNet.fit(dsi);
}
means running these codes in the main thread
Thread t = new Thread(new Executor());
t.setName("C-Executor-" + getAndUpdateExecutorCount());
t.start();
private class Executor implements Runnable {
private String folderPath;
private int classIndex;
private int classCount;
private MultiLayerNetwork theNet;
Executor() {
folderPath = resources.get("folderPath").toString();
classIndex = Integer.parseInt(resources.get("classIndex").toString());
classCount = Integer.parseInt(resources.get("classCount").toString());
theNet = new MultiLayerNetwork(
Dl4jUtils.getClassificationMLPConf(learningRate, inputNum, outputNum, hiddenLayers));
theNet.init();
theNet.setListeners(new ScoreIterationListener(1000));
}
@Override
public void run() {
Random rand = new Random(123L);
int trainFileIndex = rand.nextInt(3);
DataSetIterator dsi = null;
try {
dsi = Dl4jUtils.getClassificationDataSetIter(
folderPath + trainFileIndex + ".csv",
500, classIndex, classCount);
} catch (IOException | InterruptedException e) {
logger.warn("", e);
}
for(int i = 0; i < epocNum; i++) {
theNet.fit(dsi);
}
try {
ModelSerializer.writeModel(theNet, "....\\theNet.net", true);
} catch (IOException e1) {
e1.printStackTrace();
}
for(int i = 0; i < 2; i++) {
if(i == trainFileIndex)
continue;
try {
dsi = Dl4jUtils.getClassificationDataSetIter(
folderPath + i + ".csv", 1000, classIndex, classCount);
} catch (IOException | InterruptedException e) {
logger.warn("", e);
}
analyzeAndSaveResult(dsi, i);
}
}
private void analyzeAndSaveResult(DataSetIterator dsi, int i) {
Evaluation eval = new Evaluation(outputNum);
while(dsi.hasNext()) {
DataSet ds = dsi.next();
INDArray features = ds.getFeatures();
INDArray labels = ds.getLabels();
INDArray pred = theNet.output(features, false);
eval.eval(labels, pred);
}
logger.info(eval.stats());
try(BufferedWriter bw = new BufferedWriter(new FileWriter("....." + i + ".csv"))) {
bw.write(eval.stats());
} catch (IOException e) {
logger.warn("", e);
}
}
}
I didn't implement OpenMP myself, I assume that this should be some implementation behind DL4J? isn't it?
you modify OpenMP behavior with OMP_NUM_THREADS environment variable. If you don't set it - app uses default settings, and reports number of threads equal to your number of cores at start up. If you set this variable to something - that value will be used instead.
Hi, just checked to set OS wide env OMP_NUM_THREADS = 1
, and so far so good.
But is it the only way to solve so? I mean touching the OS wide environment variable?
@cinqs you can set it for your app too. No environment variable has to be be permanently set..that's unix 101 here. Depending on whether you're running in intellij or your server app, just run things within your environment with that config. Please remove that if you can, it's silly to set any env variable for specific apps at a global level..
Openmp (please do NOT ignore what it is and what it does, that's actually your problem) has some issues with multi threading because it itself is multi threaded for loops internally. What you're running in to here is the for loops in openmp clashing with the behavior of the ones in java, since the ones in java are actually real threads. Setting the number of threads to 1 is very common when running spark applications and other kinds of multi threading.
@agibsonccc, Hi, yes, absolutely I will remove the OS-wide env variable, thanks for reminding. I will try to set it inside the app.
Besides, I read some of OpenMP's docs, but it's too much for me to digest in a while, do you have any documents in your websites' doc centre which would explain your implementation on OpenMP? some relative docs specifically for DL4J? it would also be nice if you can recommend some docs to narrow the scope of OpenMP. thanks.
just a reminder, you can remove the bug tag since this is absolutely not a bug... maybe something called enhancement...
Ok, I found it.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Issue Description
I am attempting to train models in parallel using threading or ExecutorService. In doing so, process memory seems to continually grow until an OutOfMemoryError is thrown. Below is the code (adapted from the deeplearning4j regression example), VM arguments, pom, and exception (note for demonstration, this example does not necessarily train models in parallel but reproduces the problem on my system nonetheless)
The following VM arguments are passed to the JVM (low memory values so that the problem shows up in a reasonable amount of time)
-Xms256m -Xmx512M -Dorg.bytedeco.javacpp.maxbytes=1G -Dorg.bytedeco.javacpp.maxPhysicalBytes=1G
pom.xml
The code can be ran under different methods by changing THREADING_TYPE to one of SYNCHRONOUS, ASYNCHRONOUS, or THREAD_POOLED, the first option behaves fine while the latter two throw the following exception
Version Information
deeplearning4j and Nd4j 0.9.1 Java 1.7 CPU only Windows 7, 64-bit, 40Gb RAM, Intel Xeon CPU E5-2620 @2.0GHz (2x)
Visual VM for SYNCHRONOUS:
Visual VM for ASYNCHRONOUS:
Process (off-heap) memory just seems to continually grow when models are fit in a separate Thread