Closed agitter closed 4 years ago
The PyTorch and TensorFlow examples both ran with the 3 GB memory request.
As an aside, this was a confusing problem for me. One job with insufficient memory for Docker was in a running state for about an hour, but I couldn't SSH to it to inspect it and couldn't figure out what was wrong until it was eventually held.
@agitter this is good to know (about the memory issue) - we've had this issue before and thought it was resolved. I'll pass that along.
Closes #5
This adds the new GPU Lab flags to our existing submit file examples:
I confirmed that two of the jobs ran successfully, but the PyTorch and TensorFlow examples were held. The reason was:
so I increased their memory request. I'll confirm they run successfully before we merge.