Open zhentingqi opened 4 months ago
Hi, so I was training 345m GPT2 using your example scripts examples/pretrain_gpt.sh. The validation loss and PPL, however, keep going up, while the training loss decreases as expected. My hyperparameters are shown here:
examples/pretrain_gpt.sh
GPT_ARGS=" --num-layers 24 \ --hidden-size 1024 \ --num-attention-heads 16 \ --seq-length 1024 \ --max-position-embeddings 1024 \ --micro-batch-size 2 \ --global-batch-size 4 \ --lr 3.0e-4 \ --train-iters 300000 \ --lr-decay-iters 320000 \ --lr-decay-style cosine \ --min-lr 1.0e-5 \ --weight-decay 1e-2 \ --lr-warmup-fraction .01 \ --clip-grad 1.0 \ --fp16 " DATA_ARGS=" --data-path $DATA_PATH \ --vocab-file $VOCAB_FILE \ --merge-file $MERGE_FILE \ --data-impl mmap \ --split 700,200,100 " OUTPUT_ARGS=" --log-interval 100 \ --save-interval 50000 \ --eval-interval 1000 \ --eval-iters 10 "
Can anyone please tell me what is wrong? Should not the PPL decreases? Thanks!
Marking as stale. No activity in 60 days.
Hi, so I was training 345m GPT2 using your example scripts
examples/pretrain_gpt.sh
. The validation loss and PPL, however, keep going up, while the training loss decreases as expected. My hyperparameters are shown here:Can anyone please tell me what is wrong? Should not the PPL decreases? Thanks!