deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
http://deeplearning4j.konduit.ai
Apache License 2.0
13.71k stars 3.84k forks source link

OutOfMemoryException #4013

Closed JZ051 closed 7 years ago

JZ051 commented 7 years ago

[####] Issue Description

OutOfMemoryException after some hours when repeating DL4J-runs (from data reading and parsing into DataSets, to splitting, training and testing).

Pom, Code, Stacktrace in this Gist

I'm running DL4J repeatedly. I think I reinitialize all variables and did not expect such an error. Somehow, something on the off-heap memory remains, while new stuff gets added.

I start my experiments as a JUnit Test and pass these parameters -Dorg.bytedeco.javacpp.maxbytes=8589934592 -Dorg.bytedeco.javacpp.maxphysicalbytes=8589934592 -Xmx1g

If more code is needed, I will post it into

Version Information

Steps already tried: upgrade from 0.9.1 to 0.9.2 and increase of off-heap memory from default to, in the end, 8GB.

raver119 commented 7 years ago

Okay, what happens if/when you enable workspace use for training? It's not reflected in your source code

thhart commented 7 years ago

Maybe related: #3961

JZ051 commented 7 years ago

@raver119 : I had so far only added: ".trainingWorkspaceMode(WorkspaceMode.NONE)", now trying with "SINGLE". I will know more tomorrow, as it takes about 28h to crash with that error. I think, we should put my issue aside, until the test with "SINGLE" went through.

thhart commented 7 years ago

@JZ051 Can you try to reproduce with lower memory settings? Then you will hit earlier.

JZ051 commented 7 years ago

@thhart Yes I could. But I wouldnt know exactly, to where I could go with it and I'm leaving soon anyways for another assignment at work, so I wouldnt know until tomorrow anyway. And I'd like my program to generate some data until then.

thhart commented 7 years ago

@raver119 This is a nasty one for sure, please let us know how we can support to narrow the reason.

raver119 commented 7 years ago

No worries, once i merge my current branch, i'll focus on your 2 issues

JZ051 commented 7 years ago

I closed the issue for the moment, as it looks like it has been resolved - by replacing .trainingWorkspaceMode(WorkspaceMode.NONE) with .trainingWorkspaceMode(WorkspaceMode.SINGLE). In the beginning, I thought that "NONE" was to be used if no workspace was setup in the code. Sorry for the bother.

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.