JuliaParallel / ClusterManagers.jl

Other
235 stars 74 forks source link

Slurm robust to no jobid and to node warnings #139

Closed grahamas closed 3 years ago

grahamas commented 4 years ago

Resolves #127

Resolves additional bug where the port and IP info are not parsed when a warning or error from the node (e.g. "TMPDIR could not be created") occupies the first lines of the output file.

Related but unnecessary changes: I removed "job_" from the output file name. I stopped the automatic deletion of previous logs. At the very least, that should have a flag associated with it.

bjarthur commented 3 years ago

are there any slurm users out there who would like to review this PR?

grahamas commented 3 years ago

@kescobo Thanks for reviewing! If my explanation of that loop makes sense, then I think we're all good.

kescobo commented 3 years ago

Cool :+1: if you could just add a comment to the loop to clarify the point of it? (I know you didn't introduce it, but would be great to document it while it's top of mind). Then this is ready to merge AFAIC

grahamas commented 3 years ago

Comments added! Plus a few more.