Closed taineleau closed 2 years ago
Hi,
I used a 32GB GPU for the XSUM experiments. You could either switch to a GPU with larger memory, or you could reduce the bsz and increase the gradient_accumulation_steps.
Hi, I used one GPU (Tesla V100 SXM2 32GB) and used the command in the homepage. However, I still had the OOM problem with bsz=16 or =12. If I set the bsz to 8 and the OOM will disappear. So, is the command in the homepage the command used to reproduce the paper?
Hi,
it's the command to reproduce.
Could you check if you have --fp16 yes and whether this turn on half-precision? This should turn on half precision, so that bsz=16 could fit.
Side Note: I used AWS single GPU (I think it's A100) to run all XSUM experiments.
I have --fp 16 yes
and how can I see whether this turn on half-precision?
Side Note: the A100 in AWS has 40GB GPU memory rather than 32GB.
Maybe check if your stdout contains this: Using native 16bit precision.
Thanks!
My stdout contains: Using native 16bit precision.
It may be due to the GPU memory and I reduce the bsz
to train the model.
By the way, what does the --mid_dim
mean?
It means the dim of the MLP's middle layer! (we use an MLP for re-parametrization.)
Got it! Thanks for your answer!
Thanks! My mistake. I just figured out I didn't change the bsz in the right way so that I still get OOM even if I made bsz as 1.
Hi, I tried the seq2seq prefixtuning and found:
RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 15.90 GiB total capacity; 4.63 GiB already allocated; 797.50 MiB free; 5.81 GiB reserved in total by PyTorch)
I run the expr on a 16GB GPU. Am I supposed to use a 32GB GPU instead? Thanks!