alces-software / adminware

A sandbox CLI for running commands remotely across nodes
1 stars 0 forks source link

Revamp the job running to use async programming #153

Closed WilliamMcCumstie closed 5 years ago

WilliamMcCumstie commented 5 years ago

Previously architecture has a few issues mainly concerning the use of threading. Each job use to have its own thread which ran both the db connection and ssh connections. The multiple ssh connections are fine as fabric handles this reasonable well. However sqlachemey does not play well with threads and has a connection limit. If the connection limit is exceeded, then the db locks and future calls fail.

Instead asynchronous programming is used in the main thread. This allows for all the Jobs to be launch from a single thread and thus only requires one db connection. The problem with this is fabric.Connection().run is a blocking operation and can not be used asynchronously. Instead, the connection needs to be ran in its own thread which the main thread can await on. This essentially turns the blocking operation into an async command.

This change has some other added benefits such as:

  1. Callbacks that ensure the db connection is closed,
  2. Built in signal handling, and
  3. The ability to cancel future jobs and end current ones,