CHTC / templates-GPUs

Template job submissions using GPUs in CHTC
MIT License
39 stars 11 forks source link

Add CHTC GPU Lab flags to sub files #7

Closed agitter closed 4 years ago

agitter commented 4 years ago

Closes #5

This adds the new GPU Lab flags to our existing submit file examples:

+WantGPULab = true
+GPUJobLength = "short"

I confirmed that two of the jobs ran successfully, but the PyTorch and TensorFlow examples were held. The reason was:

Hold reason: Error from slot1_7@gpulab2002.chtc.wisc.edu: Docker job has gone over memory limit of 1024 Mb

so I increased their memory request. I'll confirm they run successfully before we merge.

agitter commented 4 years ago

The PyTorch and TensorFlow examples both ran with the 3 GB memory request.

As an aside, this was a confusing problem for me. One job with insufficient memory for Docker was in a running state for about an hour, but I couldn't SSH to it to inspect it and couldn't figure out what was wrong until it was eventually held.

ChristinaLK commented 4 years ago

@agitter this is good to know (about the memory issue) - we've had this issue before and thought it was resolved. I'll pass that along.