facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.37k stars 6.4k forks source link

out of memory #103

Closed travel-go closed 6 years ago

travel-go commented 6 years ago

Hello,I want to try 'fconv_wmt_en_de' mode. This is my code CUDA_VISIBLE_DEVICES=0,1,2 python3 train.py /data2/hfyu/fairseq-en-de-bpe/data-bin --lr 0.25 --clip-norm 0.1 --dropout 0.1 --max-tokens 2000 --arch fconv_wmt_en_de --save-dir /data2/hfyu/fairseq-en-de-bpe/training &

when I set --max-tokens 4000,it will occur error THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory | WARNING: ran out of memory on GPU #2, skipping batch

when I set --max-tokens 2000,it also occurs this error ,but just one time. | using 3 GPUs (with max tokens per GPU = 2000 and max sentences per GPU = None) | model fconv_wmt_en_de, criterion CrossEntropyCriterion | num. model params: 217643510 | epoch 001: 0%|▏ | 315/72115 [01:24<5:21:16, 3.72it/s, loss=11.45 (12.65), wps=6799, wpb=5476, bsz=196, lr=0.25, clip=100%, gnorm=1.88677, oom=0]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory | WARNING: ran out of memory on GPU #2, skipping batch | epoch 001: 0%|▏ | 318/72115 [01:25<5:21:15, 3.72it/s, loss=12.26 (12.65), wps=6804, wpb=5480, bsz=195, lr=0.25, clip=100%, gnorm=1.87365, oom=0.00943396] | WARNING: ran out of memory on GPU #2, skipping batch | epoch 001: 1%|▎ | 594/72115 [02:39<5:20:53, 3.71it/s, oss=10.88 (11.92), wps=6736, wpb=5440, bsz=189, lr=0.25, clip=100%, gnorm=1.29068, oom=0.0101523]loss=11.47 (11.92), wps=6732, wpb=5437, bsz=189, lr=0.25, clip=100%, gnorm=1.28706, oom=0.01

Does it have some influences ? Should I set --max-tokens less ?

edunov commented 6 years ago

If it was only one batch that OOMed, it probably doesn't matter. In this case, the model just skipped an update from one minibatch. So, you're most likely fine.

travel-go commented 6 years ago

thank for your reply, Why this max-tokens can only be set to 2000, because of 'fconv_wmt_en_de' model's parameters too much?

edunov commented 6 years ago

fconv_wmt_en_de is much bigger than fconv_iwslt_de_en so it is true that it takes more memory. In my setup I was able to run it with --max-tokens 4000 though, but there could be other factors at play, e.g. I use Nvidia's P100 and the bpe vocab size is 40k tokens. If you have different GPU or bigger vocab size then you may need a smaller number of tokens.

travel-go commented 6 years ago

thank you very much!