Closed Vincent-Li-9701 closed 5 years ago
Just to make sure, did you run the script to truncate the input size as in the paper? This was added to the prep_data.py script. We just used a single K80 with gradient accumulation (see scripts, e.g. this one), although making use of OpenNMT's multi-gpu training would be useful.
OOM from input not being tokenized. See thread for additional details. Closing for now. Feel free to reopen.
Hi Alex,
Thank you for the code and data. I was trying to reproduce the results in the paper. However, I keep running into out of memory issue. How many GPUs did you use during your experiments?
Thank you