Currently, Jobson's internal workflow looks like this:
Client submits APIJobRequest
Server validates it against spec, creating ValidJobRequest
Server submits ValidJobRequest to JobManager
JobManager persists ValidJobRequest to get a PersistedJobRequest
JobManager queues PersistedJobRequest in memory to get a QueuedJob
JobManager attempts to dequeue a QueuedJob
If number of executing jobs < maxConcurrentJobs (config)
Job sent to a JobExecutor for execution, creating an ExecutingJob
JobManager adds ExecutingJob to an internal map, so it knows what's running
The subprocess (executing) will send updates to the JobManager as it runs. The JobManager uses these (esp. exit codes) to alter its internal state (executing jobs, etc.) and schedule more jobs
I'm missing out some of the details, but that's the general jist of it.
Clearly, the JobManager is having to do a lot of the legwork required to maintain Jobson's internal state. The main glaring flaws are:
The JobManager is having to handle a bunch of persistence: persisting the initial job, persisting updates, persisting signals from the process, handling abortion signals, persisting job outputs, etc.
Loads of threads go via the JobManager: web request threads (initially, delivering APIJobRequests, and abortion signals), and subprocess threads (stdio, exit codes)
JobManagers internals are stateful: queues, maps, etc.
All of the above combined mean that the JobManager is one of the more complicated parts of the system.
This ticket houses the refactoring effort required to make JobManager's sole responsibility scheduling jobs. The key changes needed are:
Make JobManager take:
A read-only configuration (queue size, scheduling type, etc.)
A way of requesting submitted jobs from the persistence layer (needs: job ID, timestamps, owner, etc.)
A way of setting the state of jobs in the persistence layer (e.g. from submitted to running)
A schedule() method, which manually prompts the JobManager to try and schedule, shuffle, etc. jobs
A signalAbort(JobId jobId); method, which handles job abortion
A method take takes a JobId and onExit callback and creates a RunningJob, which the JobManager handles.
The method, or "executor", is responsible for output persistence, rather than the job manager. This shuffles some of the complexity (esp. multithreading, event emission) into the executor, where the data is emitted, and away from the scheduler.
Currently, Jobson's internal workflow looks like this:
APIJobRequest
ValidJobRequest
ValidJobRequest
toJobManager
JobManager
persistsValidJobRequest
to get aPersistedJobRequest
JobManager
queuesPersistedJobRequest
in memory to get aQueuedJob
JobManager
attempts to dequeue aQueuedJob
maxConcurrentJobs
(config)JobExecutor
for execution, creating anExecutingJob
JobManager
addsExecutingJob
to an internal map, so it knows what's runningJobManager
as it runs. TheJobManager
uses these (esp. exit codes) to alter its internal state (executing jobs, etc.) and schedule more jobsI'm missing out some of the details, but that's the general jist of it.
Clearly, the
JobManager
is having to do a lot of the legwork required to maintain Jobson's internal state. The main glaring flaws are:JobManager
is having to handle a bunch of persistence: persisting the initial job, persisting updates, persisting signals from the process, handling abortion signals, persisting job outputs, etc.JobManager
: web request threads (initially, deliveringAPIJobRequest
s, and abortion signals), and subprocess threads (stdio, exit codes)JobManager
s internals are stateful: queues, maps, etc.All of the above combined mean that the
JobManager
is one of the more complicated parts of the system.This ticket houses the refactoring effort required to make
JobManager
's sole responsibility scheduling jobs. The key changes needed are:JobManager
take:submitted
torunning
)schedule()
method, which manually prompts theJobManager
to try and schedule, shuffle, etc. jobssignalAbort(JobId jobId);
method, which handles job abortionJobId
andonExit
callback and creates aRunningJob
, which the JobManager handles.The method, or "executor", is responsible for output persistence, rather than the job manager. This shuffles some of the complexity (esp. multithreading, event emission) into the executor, where the data is emitted, and away from the scheduler.