Closed super-buster closed 3 years ago
Thanks for the question. When I run my experiments I am only using a single GPU; could you try including CUDA_VISIBLE_DEVICES=0 and see if the problem persists?
Thanks for the question. When I run my experiments I am only using a single GPU; could you try including CUDA_VISIBLE_DEVICES=0 and see if the problem persists?
Thank you for the advice. However, it doesn't work after adding CUDA_VISIBLE_DEVICES=0.
Notice that Process rank: -1, device: cuda:0, n_gpu: 8, distributed training: False, 16-bits training: False
. I think may be some tensors are loaded in cpu that trigger this problem. The code is untouched except the data path.
I think it's probably models loaded to different GPUs. Since you currently still have n_gpu: 8, could you specify n_gpu to be 1 rather than 8.
I think it's probably models loaded to different GPUs. Since you currently still have n_gpu: 8, could you specify n_gpu to be 1 rather than 8.
Thanks. I force the TrainingArguments.n_gpu=1 and it works!
Hi, I met a RuntimeError when training a prefix model. Do you have any suggestions?
Here is the command line: python train_e2e.py --optim_prefix yes --preseqlen 5 --epoch 5 --learning_rate 0.00005 --mode webnlg --bsz 5 --seed 101 --cache_dir ./cache
Here is the error information: