RS-DAT / JupyterDaskOnSLURM

Apache License 2.0
16 stars 3 forks source link

added check for file content #15

Closed meiertgrootes closed 2 years ago

meiertgrootes commented 2 years ago

Hi @fnattino, here is a solution suggestion for issue #14 . Still needs docstriings and final test.

fnattino commented 2 years ago

Uhm, I am getting this error:

Submitted batch job 3108596
Waiting for SLURM output: slurm-3108596.out ...
found
SLURM outputfile slurm-3108596.out present. Retrieving node information
Traceback (most recent call last):
  File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 368, in <module>
    main()
  File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 358, in main
    forwardconfig = ssh_remote_executor(config_inputs, check_and_retrieve_SLURM_info, outfilename, args)
  File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 191, in ssh_remote_executor
    result = func(conn, *inargs)
  File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 220, in check_and_retrieve_SLURM_info
    info_present = check_for_node_info(conn, outfilename)
  File "/Users/fnattino/Projects/RS-DAT/Repos/JupyterDaskOnSLURM/runJupyterDaskOnSLURM.py", line 274, in check_for_node_info
    result = conn.run(cmd)
  File "<decorator-gen-3>", line 2, in run
  File "/opt/miniconda3/lib/python3.9/site-packages/fabric/connection.py", line 30, in opens
    return method(self, *args, **kwargs)
  File "/opt/miniconda3/lib/python3.9/site-packages/fabric/connection.py", line 725, in run
    return self._run(self._remote_runner(), command, **kwargs)
  File "/opt/miniconda3/lib/python3.9/site-packages/invoke/context.py", line 102, in _run
    return runner.run(command, **kwargs)
  File "/opt/miniconda3/lib/python3.9/site-packages/fabric/runners.py", line 72, in run
    return super(Remote, self).run(command, **kwargs)
  File "/opt/miniconda3/lib/python3.9/site-packages/invoke/runners.py", line 380, in run
    return self._run_body(command, **kwargs)
  File "/opt/miniconda3/lib/python3.9/site-packages/invoke/runners.py", line 442, in _run_body
    return self.make_promise() if self._asynchronous else self._finish()
  File "/opt/miniconda3/lib/python3.9/site-packages/invoke/runners.py", line 509, in _finish
    raise UnexpectedExit(result)
invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!

Command: "cd ~ && cat slurm-3108596.out | grep '/path/to/private/ssh/key' - "

Exit code: 1

Stdout: already printed

Stderr: already printed
meiertgrootes commented 2 years ago

@fnattino ok, sorry. I'll check. Had to go pick Emil up. I'll get back to it this evening. Is that soon enough for your purposes?

fnattino commented 2 years ago

Sure no worries! Will also get back to this later

meiertgrootes commented 2 years ago

Ready for review.

meiertgrootes commented 2 years ago

hi @fnattino , I've added the check for server state. This times out if the server isn't running 10 seconds after the node informaation becomes available.

The order of execution is: check for output file (user specified patience), if successful check for node information (10 s patience), if successful check for server running (10s patience), if successful retrieve node information, forward port, launch browser.

This should now deal with things nicely

fnattino commented 2 years ago

Thanks @meiertgrootes , amazing. I have spotted a few typos and some duplicate text, fixed these here: #16 - if you agree with everything go and merge both!

meiertgrootes commented 2 years ago

merging w/ @fnattino 's improvements