alces-software / adminware

A sandbox CLI for running commands remotely across nodes
1 stars 0 forks source link

Fix/interrupt multiple tools #168

Closed DavidMarchant closed 5 years ago

DavidMarchant commented 5 years ago

Small changes to run

Renames execute_threaded_batches to execute_batches as it no longer truly uses threads

Stops an interrupt not properly stopping all executing tools (Fixes #164). In the end I went with just raising a ClickException over raising a KeyboardInterrupt, re-rasing a CancelledError or somehow manually signalling back that there'd been an interrupt. What do you think? The 'Aborted!' text is to be inline with how click receives an interrupt in any other places

WilliamMcCumstie commented 5 years ago

LGTM, I have removed the additional print as the interrupt doesn't need to flag an error has occurred. At some point we should review the printing during run in general.

Scratch that, this PR can be implemented in more succinct manner

WilliamMcCumstie commented 5 years ago

@DavidMarchant, Whilst the previous implementation works, catching the interrupt and re-raising the CancelledError has a bit of a code smell to it.

The issue that interrupt doesn't cancel is due to the funky control logic within the runner method. AFACIS execute_batches is fine as is.

I am going to comment on a few code sections that need some refactoring. Once this has been completed, the interrupt handling should fall into place. They all concern this runner method: https://github.com/alces-software/adminware/blob/f56128a36d14c4c5e7fdc0313011f575dc94407f/src/commands/run.py#L49

I have reset the HEAD of this branch, so you will need checkout your local working directory again.

WilliamMcCumstie commented 5 years ago

The run method should mostly operate of the batch object and not the config. This is a trivial change as the main purpose of the Batch object is to act as a bridge between Jobs and `Configs. However it will have code readability implications.

This means one of the first lines of the runner will be something like:

batches = list(map(..., configs)))
WilliamMcCumstie commented 5 years ago

Next there is a bit of error checking whether the command is valid to run. Currently we use multiple horrendous nested if statements. This makes it next to impossible to follow the logic without stepping through the code execution.

Instead all these error checks should be done up front. For readability purposes the error checks should be extracted as inner functions and then called in rapid succession:

def runner(...):
  def error_if_interactive_in_tool_group():
    ...
  def error_if_multiple_interactive_jobs():
    ...
  def error_if_no_nodes():
    ... # This needs a slightly different error for interactive jobs

  ...
  error_if_interactive_in_tool_group()
  error_if_multiple_interactive_jobs()
  error_if_no_nodes()
  ...
WilliamMcCumstie commented 5 years ago

Now that the error checks have been preformed, we know the jobs can be ran. The only headache is if it needs to be ran in quiet mode or not. This probably needs some refactoring to make it more intelligent, however this can be delayed until proper use cases for the printing are defined.

For the time being, a method should be defined to determine if the batches should be ran in quiet mode or not:

def is_quiet():
  first = batches[0]
  if batches.length > 1: False # Always run multiple jobs with printing
  elif first.is_interactive? || first.is_report?: True
  else: False

You probably have noticed that tool families always run with full printing. This is so it is obvious which tool is running when. This does mean that logging will be on for report jobs. This issue can be addressed at a latter date.

WilliamMcCumstie commented 5 years ago

Finally now that there is a batches array and is_quiet method, all the batches can be ran with:

execute_batches(batches, quiet = is_quiet())

At some point before this call, the Jobs do need to be added to the batch objects with:

batch.build_jobs(*nodes)

I'll leave this step for you to slot in where appropriate

WilliamMcCumstie commented 5 years ago

@DavidMarchant I have rebased this PR on #171 and resolved the conflicts. It was quite a tricky task to preform this rebase due to the number of changes. As a result there are a few commits that fix the rebase.

Part of the reason is the commits messages don't tell me much about the changes. A prime example would be: https://github.com/alces-software/adminware/pull/168/commits/b91dd9bcf45608e1ec1e08a46950376905a58bc9

It could have been broken into the individual refactoring steps, which would have made it easier to preform the rebase. Also commit messages should focus on the "why" not the "what". I can see what changed in the diff so rehashing it in text isn't super useful.

This is discussed in: https://chris.beams.io/posts/git-commit/ And a good example of a commit is: https://github.com/bitcoin/bitcoin/commit/eb0b56b19017ab5c16c745e6da39c53126924ed6

WilliamMcCumstie commented 5 years ago

Rebased on develop

DavidMarchant commented 5 years ago

Yep this is all code that I've seen before, LGTM

Apologies for the conflated & generalized commit, I'll make the next ones more distinct & informative