ZhenYangIACAS / NMT_GAN

generative adversarial nets for neural machine translation
Apache License 2.0
119 stars 37 forks source link

GPU out of memory #7

Closed jeicy07 closed 6 years ago

jeicy07 commented 6 years ago

Hi, I use a GPU with 24G memory to pretrain the generator and generate samples. However, when pretraining the generator for only 2 epochs, it stops because of OOM(out of memory). And when I want to try generating samples with the model lastly saved, it soon got OOM again. So I wonder how much GPU memory should leave for running this project, thanks a lot! By the way, I have adjusted the batch size from 256 in your code into 100. I don't know whether it works

ZhenYangIACAS commented 6 years ago

@jeicy07 . You mean you only use one GPU with memory 24G? In our experiments, we use 8 GPUs to train our model. If you only have one GPU, I suggest that you should set the parameter "tokens_per_batch" smaller.

jeicy07 commented 6 years ago

yep, since our lab doesn't have such equipment. I'll adjust "tokens_per_batch" later and see if it will OOM again. Thanks!

kellymarchisio commented 6 years ago

Hi - I've experienced similar ResourceExhaustedErrors when using this codebase, sometimes after many epochs over many hours. If it fails after many epochs and not within the first few steps, it would suggest to me a memory leak. Have we researched this?

xixiddd commented 6 years ago

@kellymarchisio Which version of TensorFlow did you use? I met similar error with TensorFlow 1.9, maybe you could switch to lower version to solve it.