Closed travel-go closed 6 years ago
If it was only one batch that OOMed, it probably doesn't matter. In this case, the model just skipped an update from one minibatch. So, you're most likely fine.
thank for your reply, Why this max-tokens can only be set to 2000, because of 'fconv_wmt_en_de' model's parameters too much?
fconv_wmt_en_de is much bigger than fconv_iwslt_de_en so it is true that it takes more memory. In my setup I was able to run it with --max-tokens 4000 though, but there could be other factors at play, e.g. I use Nvidia's P100 and the bpe vocab size is 40k tokens. If you have different GPU or bigger vocab size then you may need a smaller number of tokens.
thank you very much!
Hello,I want to try 'fconv_wmt_en_de' mode. This is my code CUDA_VISIBLE_DEVICES=0,1,2 python3 train.py /data2/hfyu/fairseq-en-de-bpe/data-bin --lr 0.25 --clip-norm 0.1 --dropout 0.1 --max-tokens 2000 --arch fconv_wmt_en_de --save-dir /data2/hfyu/fairseq-en-de-bpe/training &
when I set --max-tokens 4000,it will occur error THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory | WARNING: ran out of memory on GPU #2, skipping batch
when I set --max-tokens 2000,it also occurs this error ,but just one time. | using 3 GPUs (with max tokens per GPU = 2000 and max sentences per GPU = None) | model fconv_wmt_en_de, criterion CrossEntropyCriterion | num. model params: 217643510 | epoch 001: 0%|▏ | 315/72115 [01:24<5:21:16, 3.72it/s, loss=11.45 (12.65), wps=6799, wpb=5476, bsz=196, lr=0.25, clip=100%, gnorm=1.88677, oom=0]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory | WARNING: ran out of memory on GPU #2, skipping batch | epoch 001: 0%|▏ | 318/72115 [01:25<5:21:15, 3.72it/s, loss=12.26 (12.65), wps=6804, wpb=5480, bsz=195, lr=0.25, clip=100%, gnorm=1.87365, oom=0.00943396] | WARNING: ran out of memory on GPU #2, skipping batch | epoch 001: 1%|▎ | 594/72115 [02:39<5:20:53, 3.71it/s, oss=10.88 (11.92), wps=6736, wpb=5440, bsz=189, lr=0.25, clip=100%, gnorm=1.29068, oom=0.0101523]loss=11.47 (11.92), wps=6732, wpb=5437, bsz=189, lr=0.25, clip=100%, gnorm=1.28706, oom=0.01
Does it have some influences ? Should I set --max-tokens less ?