Closed creisle closed 3 years ago
Training the verdict prediction model with the default parameters requires substantial GPU memory. The baseline was trained on a Quadro RTX 8000 with 48GB of memory and about 38 - 40GB were needed. However, you might just want to try to reduce the batch size and increase the gradient_accumulation_steps accordingly.
Thanks for getting back to me on this! I actually got around this eventually but I had to update transformers to v4.11 so I could use the gradient checkpointing option. Then I was able to train it with < 24 GB GPU memory.
@Raldir I am trying to run the fine-tuning training for the verdict prediction model but I keep running into CUDA memory issues. Do you remember what hardware specifications were required when you ran this?