Starting is an odd state. It is a "state" of the processes and the files the processes create in a running job and doesn't map cleanly to an actual batch job state. And then K8s has a completely different algorithm (if queued? and connection_info available?).
Seems like this knowledge about connection info AND starting? status - whether that connection info is relayed via a file or via some other method - should be moved to the adapter. Then a Job's "status" could have some information about the job that is ancillary to the job status. We wouldn't add a new state "starting", just like we no longer have a state "failed" which introduced a set of unintended problems. Instead, just like "error" information which can be ancillary to the state "completed" to indicate a failure, a job info object could have attributes about the job's progress that is more granular.
Or at least move the concept of starting as it exists to the adapters or the info object.
Or find another solution.
Starting is an odd state. It is a "state" of the processes and the files the processes create in a running job and doesn't map cleanly to an actual batch job state. And then K8s has a completely different algorithm (if queued? and connection_info available?).
Seems like this knowledge about connection info AND starting? status - whether that connection info is relayed via a file or via some other method - should be moved to the adapter. Then a Job's "status" could have some information about the job that is ancillary to the job status. We wouldn't add a new state "starting", just like we no longer have a state "failed" which introduced a set of unintended problems. Instead, just like "error" information which can be ancillary to the state "completed" to indicate a failure, a job info object could have attributes about the job's progress that is more granular.
Or at least move the concept of starting as it exists to the adapters or the info object.
One of the problems is the control flow here:
https://github.com/OSC/ondemand/blob/dada4cba7d66a6dda334c19a4bf021b616650491/apps/dashboard/app/models/batch_connect/session.rb#L389-L395
and in particular
is very specific to K8s.
This type of code change is more easily done when ood_core is in the monorepo.
┆Issue is synchronized with this Asana task by Unito