GPU requirements - Githubissues

bjascob commented 2 years ago

I attempted to train the model using bash run/run_experiment.sh configs/amr2.0-structured-bart-large-sep-voc.sh and it looks like my older 12GB Titan X GPU doesn't have enough memory. Can you let me know what you used for training and approximately how long it takes to train.

In the above config file I tried changing BATCH_SIZE=128 to BATCH_SIZE=1 and I'm still getting CUDA OOM errors. Is there something else I need to modify to reduce the memory?

Do you know if this will train on a single 24GB GPU (ie RTX 3090) and if so, how long that takes.

ramon-astudillo commented 2 years ago

you can reduce batch size and increase gradient accumulation with this parameter

https://github.com/IBM/transition-amr-parser/blob/master/configs/amr2.0-structured-bart-large-sep-voc.sh#L160

to simulate larger batches without using so much memory. Don't know about speed though. We usually train on one single v100 with fp16, this takes 7-8h to train (AMR3.0 takes more).

bjascob commented 2 years ago

Thanks for the info. Looks like 12GB in not enough memory for the large model.

IBM / transition-amr-parser

GPU requirements #19