bartvm / nmt

Neural machine translation
MIT License
2 stars 2 forks source link

Out of Memory errors #41

Closed anirudh9119 closed 8 years ago

anirudh9119 commented 8 years ago

I have been facing, out of memory errors now, (no part of code is changed, everything is exactly as earlier) Has anyone experienced this? I am running on Kepler(12G) with 4 GPU's.

JinseokNam commented 8 years ago

I have no idea why it happened. If you changed nothing, I guess another guy might execute a job on the same machine accidentally.

bartvm commented 8 years ago

I'm assuming these are GPU out of memory errors. Since the GPUs are in exclusive mode, other users shouldn't matter. What could make a difference though is the memory allocator (CnMeM), changing the cuDNN version, etc.

I had memory errors two, but solved them by limiting sentences to length 50 (see PRs). On Feb 22, 2016 12:31, "JinseokNam" notifications@github.com wrote:

I have no idea why it happened. If you changed nothing, I guess another guy might execute a job on the same machine accidentally.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-187280830.

JinseokNam commented 8 years ago

I'm using rather short sentence length, i.e. 50. With "the standard configuration'' of hyperparameters, I've never experienced such an out of memory issue. As the memory requirement in my case doesn't exceed 4G in this case, I don't understand why longer sentences seem to be the reason for it if you run the experiments on 12G GPUs.

I'm currently working with only nmt_single.py. Does this mean that the multi-GPU stuff requires usually much more extra memory space than the single GPU model using nmt_single.py?

By the way, which PRs do you mean, where the error solved by limiting sentence length?

anirudh9119 commented 8 years ago

I don't know the reason, but I am not facing any issues, if I clip the length of the sentences.

bartvm commented 8 years ago

It shouldn't have been removed? It's still in e.g. #43, just haven't merged it yet.

@bartvm https://github.com/bartvm Why you removed the shuffle from data_iterator ? was that some issue ? Sorry, if I missed something in notifications.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-188310149.

anirudh9119 commented 8 years ago

yeah, I saw that again.

On Wed, Feb 24, 2016 at 10:56 AM, Bart van Merriënboer < notifications@github.com> wrote:

It shouldn't have been removed? It's still in e.g. #43, just haven't merged it yet.

@bartvm https://github.com/bartvm Why you removed the shuffle from data_iterator ? was that some issue ? Sorry, if I missed something in notifications.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-188310149.

— Reply to this email directly or view it on GitHub https://github.com/bartvm/nmt/issues/41#issuecomment-188317228.