IBM / transition-amr-parser

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.
Apache License 2.0
246 stars 48 forks source link

GPU requirements #19

Closed bjascob closed 3 years ago

bjascob commented 3 years ago

I attempted to train the model using bash run/run_experiment.sh configs/amr2.0-structured-bart-large-sep-voc.sh and it looks like my older 12GB Titan X GPU doesn't have enough memory. Can you let me know what you used for training and approximately how long it takes to train.

In the above config file I tried changing BATCH_SIZE=128 to BATCH_SIZE=1 and I'm still getting CUDA OOM errors. Is there something else I need to modify to reduce the memory?

Do you know if this will train on a single 24GB GPU (ie RTX 3090) and if so, how long that takes.

ramon-astudillo commented 3 years ago

you can reduce batch size and increase gradient accumulation with this parameter

https://github.com/IBM/transition-amr-parser/blob/master/configs/amr2.0-structured-bart-large-sep-voc.sh#L160

to simulate larger batches without using so much memory. Don't know about speed though. We usually train on one single v100 with fp16, this takes 7-8h to train (AMR3.0 takes more).

bjascob commented 3 years ago

Thanks for the info. Looks like 12GB in not enough memory for the large model.