Open jchodera opened 8 years ago
Meanwhile, there are 2205 idle threadslots:
[chodera@mskcc-ln1 ~]$ checkjob 7190964
job 7190964
AName: cluster-src
State: Idle
Creds: user:chodera group:jclab class:batch qos:preemptor
WallTime: 00:00:00 of 16:00:00
BecameEligible: Wed May 11 18:50:16
SubmitTime: Wed May 11 18:49:00
(Time Queued Total: 2:22:52:11 Eligible: 2:22:51:24)
TemplateSets: DEFAULT
Total Requested Tasks: 32
Req[0] TaskCount: 32 Partition: ALL
Opsys: --- Arch: --- Features: batch
Dedicated Resources Per Task: PROCS: 1 MEM: 3072M
SystemID: MSKCC
SystemJID: 7190964
Notification Events: JobFail
BypassCount: 12848
Flags: RESTARTABLE,SUSPENDABLE,PREEMPTOR
Attr: checkpoint
StartPriority: 12469
NOTE: job req cannot run in partition MSKCC (available procs do not meet requirements : 0 of 32 procs found)
idle procs: 2205 feasible procs: 0
Node Rejection Summary: [Features: 39][State: 5][Reserved: 34]
So much for including those results in my talk...
Explain the above please.
Here's what happened:
showstart
predicted < 1 day start timeshowstart
still predicted ~ 1 day start timeIn the original design requirements, we selected Torque/Moab because it could provide max wait time estimates for jobs for precisely occassions like this involving hard deadlines: If your job was predicted to take too long, you could try to alternatively partition it to get it done in the required time (or negotiate with other groups to hold/stop some jobs).
Somehow, max wait time reporting is now broken. Or something very non-obvious is going on with these jobs.
I refer to your abusive comment. I have filed a complaint and will not assist you further.
I refer to your abusive comment. I have filed a complaint and will not assist you further.
I'm genuinely unclear on which part was the abusive comment. Was it the reference for not getting data in time for my talk? I'm certainly not trying to offend anyone!
Oh! If it was the "Since I have no idea how the hell support is being handled now" comment in the mail to hpc-request
, that was simply my confusion about feeling in the dark about the proper procedure for reporting issues. I think there was a migration away from primary support via this issue tracker, as I was surprised to see the hal
login message had suddenly changed without fanfare to request users email hpc-request
. This was certainly not a comment on response times or service quality, just a commentary on being left in the dark regarding support request procedures. Apologies if that led to offense---it certainly was unintended.
To be clear: We've received no official communication regarding a change in support procedure.
I consider "So much for including those results in my talk..." unnecessary and abusive.
I provide support within the terms of my scope of work. None of which includes weekend support but I do it anyway. I work very hard to resolve problems and do not deserve such statements.
Talk to Juan Perin on Monday. You'll get not further help from me.
Clarifying email sent. Will talk to Juan when I return from conference. Weekend support certainly not expected or needed---deadline for getting data in time for writing talk had already passed when issue was filed.
I have a few multicore jobs queued up:
The estimated start time (from
showstart
) was first a day, and then after a day was still a day, and now (after another day) is up to NINE DAYS:What is going on here?