Closed anirudh9119 closed 8 years ago
I have no idea why it happened. If you changed nothing, I guess another guy might execute a job on the same machine accidentally.
I'm assuming these are GPU out of memory errors. Since the GPUs are in exclusive mode, other users shouldn't matter. What could make a difference though is the memory allocator (CnMeM), changing the cuDNN version, etc.
I had memory errors two, but solved them by limiting sentences to length 50 (see PRs). On Feb 22, 2016 12:31, "JinseokNam" notifications@github.com wrote:
I have no idea why it happened. If you changed nothing, I guess another guy might execute a job on the same machine accidentally.
— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-187280830.
I'm using rather short sentence length, i.e. 50. With "the standard configuration'' of hyperparameters, I've never experienced such an out of memory issue. As the memory requirement in my case doesn't exceed 4G in this case, I don't understand why longer sentences seem to be the reason for it if you run the experiments on 12G GPUs.
I'm currently working with only nmt_single.py
. Does this mean that the multi-GPU stuff requires usually much more extra memory space than the single GPU model using nmt_single.py
?
By the way, which PRs do you mean, where the error solved by limiting sentence length?
I don't know the reason, but I am not facing any issues, if I clip the length of the sentences.
It shouldn't have been removed? It's still in e.g. #43, just haven't merged it yet.
@bartvm https://github.com/bartvm Why you removed the shuffle from data_iterator ? was that some issue ? Sorry, if I missed something in notifications.
— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-188310149.
yeah, I saw that again.
On Wed, Feb 24, 2016 at 10:56 AM, Bart van Merriënboer < notifications@github.com> wrote:
It shouldn't have been removed? It's still in e.g. #43, just haven't merged it yet.
@bartvm https://github.com/bartvm Why you removed the shuffle from data_iterator ? was that some issue ? Sorry, if I missed something in notifications.
— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-188310149.
— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-188317228.
I have been facing, out of memory errors now, (no part of code is changed, everything is exactly as earlier) Has anyone experienced this? I am running on Kepler(12G) with 4 GPU's.