A piece of GPU can train this model?

lrongzheni commented 2 years ago

set pytorch PYTORCH_NO_CUDA_MEMORY_CACHING=1 to close the cache and resolve the OOM, but train slowly. Is this normal?
Epoch 0: 0%| | 3/12784 [00:07<8:32:20, 2.41s/it, loss=7.35, v_num=5]/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " saving checkpoint now saving models now/22.. try calling the pl_module save Epoch 0: 0%| | 3/12784 [00:07<9:08:09, 2.57s/it, loss=7.35, v_num=5] Traceback (most recent call last): File "/data/lirongzhen/PrefixTuning/seq2seq/finetune.py", line 878, in main(args) File "/data/lirongzhen/PrefixTuning/seq2seq/finetune.py", line 779, in main trainer: pl.Trainer = generic_train( File "/data/lirongzhen/PrefixTuning/seq2seq/lightning_base.py", line 795, in generic_train trainer.fit(model) File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit results = self.accelerator_backend.train() File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 68, in train results = self.train_or_test() File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test results = self.trainer.train() File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 521, in train self.train_loop.run_training_epoch() File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 560, in run_training_epoch batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx) File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 687, in run_training_batch self.training_step_and_backward( File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 816, in training_step_and_backward self.backward(result, optimizer, opt_idx) File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py", line 836, in backward result.closure_loss = self.trainer.accelerator_backend.backward( File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 98, in backward closure_loss = self.trainer.precision_connector.backend.backward( File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/plugins/native_amp.py", line 46, in backward model.backward(closure_loss, optimizer, opt_idx) File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1152, in backward loss.backward(*args, **kwargs) File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/data/anaconda3/envs/PrefixTuning/lib/python3.9/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 47.54 GiB total capacity; 41.43 GiB already allocated; 900.75 MiB free; 44.68 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

XiangLi1999 commented 2 years ago

For XSUM, I think you need a GPU with a larger memory (e.g., > 32GB), otherwise, you need to train for very very long time. For table-to-text, I think it's fine.

YahooHu commented 2 years ago

For XSUM, I use a Tesla A100 (40GB) to train, but still OOM. How can I solve this? @XiangLi1999

RuntimeError: CUDA out of memory. Tried to allocate 1.20 GiB (GPU 0; 39.59 GiB total capacity; 36.82 GiB already allocated; 261.50 MiB free; 37.90 GiB reserved in total by PyTorch) Epoch 0: 0%| | 0/12784 [00:01<?, ?it/s]

zhaone commented 2 years ago

@YahooHu Hi, I met the same issue the first I run this code. You may follow this hyper parameter given by the author.

set precision fp16
change --mid_dim from 800 to 512
reduce --max_source_length to 512

This setting reduces the memory to about 20G on my A100 card.

sonsus commented 2 years ago

@zhaone just found there was change to mid_dim. Thanks.

Is this reducing the mid_dim of the MLP right? (reducing pretrained gpt mid_dim does not make sense)
In table 2, there is Prefix(0.1%) setting. Does this uses preseqlen = 10 (0.1%) ? or reduced some other part of the model (e.g. reducing mid_dim) ? @XiangLi1999

XiangLi1999 / PrefixTuning

A piece of GPU can train this model? #25