Open artificial-insanity opened 3 years ago
Hi @artificial-insanity, no problem! The "overflow detected" warning is normal, but I have not encountered this UnboundLocalError. Some searching seems suggesting a memory issue. Do you mind sending more logs? Also, can you also monitor your GPU: watch -n 1 nvidia-smi
and observe its status while executing the command?
Hi, I am having a problem while trying to replicate the pretraining process of the model. I am running on a
Ubuntu 18.04.5 LTS (GNU/Linux 5.9.11-3-MANJARO x86_64)
machine with one GeForce RTX3090 GPU (NVIDIA-SMI 455.45.01, Driver Version: 455.45.01, CUDA Version: 11.1).After running
./scripts/pretrain/preprocess-pretrain-all.sh
to process the provided data in your repo under thedata-src/pretrain_all
, and running./scripts/pretrain/pretrain-all.sh
, I got an errorUnboundLocalError: local variable 'num_updates' referenced before assignment
. This happened after multipleoverflow detected, setting loss scale to: XX
messages. The full log is given in the txt file below.pretrain_log.txt
Does this mean I need to tweak the parameters in the
scripts/pretrain/pretrain-all.sh
to get it running? Or do I need to use some data other than those provided in thedata-src/pretrain_all
to run the model?I am a novice to this whole thing, so please allow me to apologize in advance if this was not a good question. Thank you!