Closed yuan-commits closed 2 years ago
Several things:
--tokens-per-sample 3072
, which is too large of a context size for a model with this many parameters. Try 1024 or even smaller (512 could be fine).--memory-efficient-fp16
instead of --fp16
. This is a slightly more aggressive version of mixed precision training which will save memory, but typically requires a large batch size and may produce slightly worse perplexities in the end.Another idea is to use model parallel training. We support this here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b#example-training-command-model-parallel
Thank you for your reply @myleott.
--sample-break-mode eos
is set, --tokens-per-sample 3072
takes no effects and thus can be ignored. I set --sample-break-mode eos
because the training corpus is sentence-level and does not contains any document or paragraph information, setting --sample-break-mode none
also gives a worse performance in my former experiments (use the model for sentence-level reranking).--memory-efficient-fp16
deal well with overflow? BTW, I also note that in my --fp16
experiments, using adam with ivs_sqrt skips 1/5 epoch samples in the first epoch because of frequent NOTE: overflow detected, setting loss scale to: 64.0
(I also print the L2 norm of the gradient before clipping, it shows inf
when overflow detected). When change to nag
, it works well (seems that momentum smooths the gradient).I also note that the oom occurs on nag
. Does nag
have more memory consumption than adam
?
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
❓ Questions and Help
What is your question?
I am training a GPT2-big model on a large dataset (6600M words) on
NVIDIA V100 32G * 8
. It shows OOM error when training. Any methods for optimize GPU memory and fix OOM?In addition, is there any instruction or demo training script for training gpt2-large besides the LM readme?
Code
Here is my training script, following the [adaptive LM readme](https://github.com/pytorch/fairseq/blob/master/examples/language_model/README.adaptive_inputs.md). Note that I enable fp16 option. ``` bash # max-tokens 2k fairseq-train --task language_modeling \ "$DATA_DIR" \ --save-dir "$SAVE_DIR" \ --arch transformer_lm_gpt2_big \ --optimizer nag --clip-norm 0.1 \ --lr 0.0001 --lr-scheduler cosine --max-lr 1.0 \ --t-mult 2 --lr-period-updates 270000 --lr-shrink 0.75 \ --warmup-updates 16000 --warmup-init-lr 1e-07 \ --max-tokens 2048 --update-freq 3 \ --tokens-per-sample 3072 --sample-break-mode eos --seed 1 \ --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d \ --min-lr 1e-09 \ --fp16 \ --save-interval-updates 2000 \ --keep-interval-updates 1 # max-tokens 64 fairseq-train --task language_modeling \ "$DATA_DIR" \ --save-dir "$SAVE_DIR" \ --arch transformer_lm_gpt2_big \ --optimizer nag --clip-norm 0.1 \ --lr 0.0001 --lr-scheduler cosine --max-lr 1.0 \ --t-mult 2 --lr-period-updates 270000 --lr-shrink 0.75 \ --warmup-updates 16000 --warmup-init-lr 1e-07 \ --max-tokens 64 --update-freq 96 \ --tokens-per-sample 3072 --sample-break-mode eos --seed 1 \ --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d \ --min-lr 1e-09 \ --fp16 \ --save-interval-updates 2000 \ --keep-interval-updates 1 ``` #### What have you tried? - I have train a transformer_lm_gpt2_small model with the same optimizing strategy, it works well. (not using adam with rvs_sqrt because it always show overflow under fp16) - Reduce the max-token even to 64 as well as increasing update_freq also results in OOM error. #### What's your environment? - fairseq Version (e.g., 1.0 or master): former master (commit 775122950d145382146e9120308432a9faf9a9b8) - PyTorch Version (e.g., 1.0) 1.4.0 - OS (e.g., Linux): Ubuntu 16.04.6 LTS - How you installed fairseq (`pip`, source): editable mode by using: ``` bash git clone https://github.com/pytorch/fairseq cd fairseq pip install --editable . ``` - Build command you used (if compiling from source): None - Python version: 3.7 - CUDA/cuDNN version: CUDA runtime version: 10.0.130 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 - GPU models and configuration: V100 32G * 8 - Any other relevant information: