ExaWorks / job-api-spec

https://exaworks.org/job-api-spec/
3 stars 3 forks source link

Reconsider [Job|JobExecutor].setJobStatusCallback #177

Open hategan opened 1 year ago

hategan commented 1 year ago

First, it seems like a property may be more appropriate here and, the inclusion of "job" in the method name may be superfluous. It is unlikely that significant ambiguity would be introduced if the name was simply setStatusCallback or even 'setCallback`.

However, there are two other, more functional issues:

  1. JobExecutor.get_instance may return a singleton instance. A callback set on that instance previously would be overwritten by any new call to setJobStatusCallback. And, because the callback is a "write-only" property, there is no way to check whether a callback is already set.
  2. This is somewhat related to (1), but chaining callbacks, which is one way of supporting multiple callbacks, requires knowing what the previous callback in the chain is, which is, again, made difficult by the "write-only" aspect of this property.

Suggested Clarification

We should either make setJobStatusCallback into a read/write property (jobStatusCallback) or allow for multiple callbacks directly, e.g., by changing to addJobStatusCallback / removeJobStatusCallback since it's a tried and tested method in both Java and Python.

hategan commented 1 year ago

@andre-merzky, thoughts?

frobnitzem commented 1 year ago

I think some of the difficulty here is that there are too many ways to do the same thing. I'd prefer fixing the definition of the job at initialization, making these properties read-only everywhere that can be tolerated. Similarly, for executors, if they have default properties like callbacks, they should be fixed on creation of the executor. Changes to the executor should definitely not effect already created jobs.

andre-merzky commented 1 year ago

I think I am in the opposite camp to @frobnitzem I'm afraid. I like that job's can be initialized in multiple steps as one then can separate concerns in job creation code:

job = create_job
add_staging(job)
add_environment(job)
submit_job(job)

That's not a useful example, but I hope it illustrates the point.

Changes to the executor should definitely not effect already created jobs.

I think that adding a state callback at any point should trigger state callbacks for all jobs of that executor, not only for the ones submitted after. That would (needlessly) complicate implementation, but more importantly we need to consider reconnecting executors which certainly should invoke callbacks for jobs submitted earlier by different instances of the executor.

I would favor addJobStatusCallback / removeJobStatusCallback for the same reasons as mentioned by @hategan.

hategan commented 1 year ago

While it would generally be good to keep discussions on different topics separate, I can see how the read-only properties and callbacks may be related.

I'd like to point out that two ways to do things is not quite that many and the ability to initialize read-write fields through constructors is nearly universal across the imperative/OO language spectrum. There is also some precedent in Python standard libraries for this. Unfortunately, most classes in the Python standard libraries are rather simplistic, but, e.g., Thread allows setting the name and daemon properties both thought the constructor and, after construction, through properties.

As for add/remove, there is some precedent in concurrent.futures.Future and asyncio.Future. The former only has a add_done_callback, but the latter also allows remove_done_callback.

frobnitzem commented 1 year ago

I'd like to append to my comment above that mutable state is much harder to reason about - so should not be used unless there's no other way. Especially for data that will be executed later it's important to control the number of code paths that update it (attack surface), and ideally have a single validation point where everything can be considered as a whole. Also practically, too many ways to modify the task graph (e.g. in Fireworks) creates many potential deadlock scenarios.

hategan commented 1 year ago

I'd like to append to my comment above that mutable state is much harder to reason about - so should not be used unless there's no other way. Especially for data that will be executed later it's important to control the number of code paths that update it (attack surface), and ideally have a single validation point where everything can be considered as a whole. Also practically, too many ways to modify the task graph (e.g. in Fireworks) creates many potential deadlock scenarios.

Perhaps this should be documented better, but it's more of a quasi-mutable state: you build your spec object up to the point when you submit the job, after which modifications are irrelevant. I agree with the principle that mutable state in concurrent systems is the root of a lot of evil, but I don't think that's quite what we have here. The ability to build a JobSpec object incrementally is a matter of convenience meant to be done sequentially. Of course, we cannot ultimately eliminate the ability to, say, populate the argument list from multiple threads. One can do so with or without read-only properties, completely outside of PSI/J.

There is also purposefully no task graph in PSI/J and the tasks are completely independent. A system built on top of it can and should take reasonable steps to avoid way of creating deadlocks.