Matgenix / jobflow-remote

jobflow-remote is a Python package to run jobflow workflows on remote resources.
https://matgenix.github.io/jobflow-remote/
Other
17 stars 9 forks source link

Open discussion on job states/jobflow integration #21

Open davidwaroquiers opened 1 year ago

davidwaroquiers commented 1 year ago

In light of future developments/integration with/in jobflow, I open this issue to discuss the possible states of a job.

The current states of a job are:

I can think of two additional states that could prove useful:

Both cases would probably be handled exactly the same way as PAUSED but would/could be distinct just for the sake of a user's perspective. (although a cancelled job is not meant to be "uncancelled" and a stopped job is not meant to be "unstopped")

It is not yet clear whether jobflow-remote will be integrated completely or partly inside jobflow or left as a "third-party" (possibly mandatory or optional deps of jobflow). I would think that in any case it might be good to use the same JobState in the new queue. One point about the REMOTE_ERROR is that we chose the name instead of just "ERROR" so that there is no confusion between ERROR and FAILED. (if job states go in jobflow, REMOTE_ERROR would not have a "general" sense, we "might" think of changing the name but I am really not sure, and if so, to what ? EXEC_ERROR ? something else ?)

@gpetretto Any other comments on this ? @utf What do you think ? Might be good to have a meeting to discuss this. Do you think we could/should include someone else in the discussion ? I will send you an email to see when we can discuss this and the new queue.

davidwaroquiers commented 1 year ago

One thing which could be nice is to have names that all have distinct first letter so that a single letter would also correspond to a single state (currently READY is R and REMOTE_ERROR is RE). We could have:

Not sure if it is that important though.