Closed bjascob closed 3 years ago
you can reduce batch size and increase gradient accumulation with this parameter
to simulate larger batches without using so much memory. Don't know about speed though. We usually train on one single v100 with fp16, this takes 7-8h to train (AMR3.0 takes more).
Thanks for the info. Looks like 12GB in not enough memory for the large model.
I attempted to train the model using
bash run/run_experiment.sh configs/amr2.0-structured-bart-large-sep-voc.sh
and it looks like my older 12GB Titan X GPU doesn't have enough memory. Can you let me know what you used for training and approximately how long it takes to train.In the above config file I tried changing
BATCH_SIZE=128
toBATCH_SIZE=1
and I'm still getting CUDA OOM errors. Is there something else I need to modify to reduce the memory?Do you know if this will train on a single 24GB GPU (ie RTX 3090) and if so, how long that takes.