Closed tatarsky closed 9 years ago
These are jobs that are not even running yet so I am going to remove them from the queue and discuss it with the user in the morning. I feel its more important to protect the state of running jobs. If you disagree, comment below.
I have figured it out. This is a result of these jobs in the lowpriority queue. They have been pre-empted multiple times and each time the output file is appended to. The result is a massive Torque spool file.
I am emailing the user suggesting these be re-queued in regular batch queues or reduce the stdout....
Thanks Paul, I’m sorry i missed this earlier. Actually, i know this user well because she interned with me at CHOP a while ago. Sounds like its all sorted out, and given my experience with her in the past, she’s a very good user and learns quickly. I believe i can get a hold of her if you don’t hear back.
Juan
On Sep 10, 2015, at 8:44 PM, tatarsky notifications@github.com wrote:
I have figured it out. This is a result of these jobs in the lowpriority queue. They have been pre-empted multiple times and each time the output file is appended to. The result is a massive Torque spool file.
I am emailing the user suggesting these be re-queued in regular batch queues or reduce the stdout....
— Reply to this email directly or view it on GitHub https://github.com/cBio/cbio-cluster/issues/314#issuecomment-139419705.
Yes, email contact made and we're looking at the issue. Jobs with the large spool files have been killed to be re-run and disk usage now normal. Immediate crisis over and will work with user to determine best way forward!
Just a hunch. Would the qsub
option -k
help in this case? The huge output files would be written to the user's home and should not be held in the spool.
Its part of the items I am mentioning ;)
Sorry, didn't see that.
You could not have seen it as it will be in an email ;)
Thanks for the quick action!
Three thoughts:
For the first two I will look but I did not see one.
For the later, already in my notes for said new master node spec! (And was already something I considered undersized but had never seen this case of filling it before)
I am attempting to contact @angelamyu
If you know this user please try to assist.
I have tried email and am going to call shortly. Several jobs by this user have extremely large stdout files in the Torque spool. We almost filled the torque spool disk a moment ago.
I am unclear why these jobs have such massive (150MB and above) stdout files in the spool but I need to examine the jobs with this person. Or kill those jobs but I am trying to not do so.
I have managed to clear some space but the spool area is not large and has never had this happen before.