Closed meiertgrootes closed 2 years ago
Uhm, I am getting this error:
Submitted batch job 3108596
Waiting for SLURM output: slurm-3108596.out ...
found
SLURM outputfile slurm-3108596.out present. Retrieving node information
Traceback (most recent call last):
File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 368, in <module>
main()
File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 358, in main
forwardconfig = ssh_remote_executor(config_inputs, check_and_retrieve_SLURM_info, outfilename, args)
File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 191, in ssh_remote_executor
result = func(conn, *inargs)
File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 220, in check_and_retrieve_SLURM_info
info_present = check_for_node_info(conn, outfilename)
File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 274, in check_for_node_info
result = conn.run(cmd)
File "<decorator-gen-3>", line 2, in run
File "/opt/miniconda3/lib/python3.9/site-packages/fabric/connection.py", line 30, in opens
return method(self, *args, **kwargs)
File "/opt/miniconda3/lib/python3.9/site-packages/fabric/connection.py", line 725, in run
return self._run(self._remote_runner(), command, **kwargs)
File "/opt/miniconda3/lib/python3.9/site-packages/invoke/context.py", line 102, in _run
return runner.run(command, **kwargs)
File "/opt/miniconda3/lib/python3.9/site-packages/fabric/runners.py", line 72, in run
return super(Remote, self).run(command, **kwargs)
File "/opt/miniconda3/lib/python3.9/site-packages/invoke/runners.py", line 380, in run
return self._run_body(command, **kwargs)
File "/opt/miniconda3/lib/python3.9/site-packages/invoke/runners.py", line 442, in _run_body
return self.make_promise() if self._asynchronous else self._finish()
File "/opt/miniconda3/lib/python3.9/site-packages/invoke/runners.py", line 509, in _finish
raise UnexpectedExit(result)
invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!
Command: "cd ~ && cat slurm-3108596.out | grep '/path/to/private/ssh/key' - "
Exit code: 1
Stdout: already printed
Stderr: already printed
@fnattino ok, sorry. I'll check. Had to go pick Emil up. I'll get back to it this evening. Is that soon enough for your purposes?
Sure no worries! Will also get back to this later
Ready for review.
hi @fnattino , I've added the check for server state. This times out if the server isn't running 10 seconds after the node informaation becomes available.
The order of execution is: check for output file (user specified patience), if successful check for node information (10 s patience), if successful check for server running (10s patience), if successful retrieve node information, forward port, launch browser.
This should now deal with things nicely
Thanks @meiertgrootes , amazing. I have spotted a few typos and some duplicate text, fixed these here: #16 - if you agree with everything go and merge both!
merging w/ @fnattino 's improvements
Hi @fnattino, here is a solution suggestion for issue #14 . Still needs docstriings and final test.