Open ecatkins opened 4 years ago
I did solve this by upgrading to a Tesla T4 16GB (batch size is still pretty limited though). Is this worth a note in the README? I've never had an issue with that GPU before across DL tasks (e.g. used it to train TensorFlow object detection models) -> so it might just be worth indicating to people where they need to start with GPU size.
When trying to finetune BERT on a classification task (
run_classifier.py
) using my own dataset, I am running into the OOM issue with the following traceback:(This doesn't break the script, it just keeps running).
I've tried reducing the batch size from 32 -> 16 -> 4 -> 1, none of which have an impact. I am using a Tesla P4 with 8GB. Is my issue as simple as having to increase my GPU memory? Or is there something else going on?