Closed ButteredGroove closed 5 years ago
My suggestion is that you should try to run it on CPU first, maybe something else goes wrong, but running on GPU doesn't report the exact error.
Good idea. Thank you.
I went ahead and found a GPU with more RAM (24GB) and it worked. Because this was an increase in requirements based on my own corpus and no clear issue with the code itself, I'll go ahead and close this.
Hi and thanks for the AMR parser and paper! I was able to use it to train a model and get scores for LDC2017. It ran fine and I got results. My K80 12GB GPU went up to around 10GB of RAM usage but finished without issue.
I then grabbed another data set and tried again. src/preprocessing, src/rule_system_build.py, and src/data_build.py all completed. However, src/train.py crashed with an out of memory issue. It gets as far as starting epoch 1, but after processing a random number of batches it crashes.
I tried using smaller batch sizes via train.py's -batch_size argument, but even a batch size of 1 resulted in an OOM issue.
Is there another train.py setting that you'd recommend that I set to prevent the crash? Any other ideas?
Here are the details of the crash: