glotzerlab / signac-flow

Workflow management for signac-managed data spaces.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
48 stars 37 forks source link

Revising the signac-flow API #665

Open kidrahahjo opened 2 years ago

kidrahahjo commented 2 years ago

Feature description

Coming from https://github.com/glotzerlab/signac/wiki/GSoC-2022-Projects#2-revising-the-signac-flow-api, we can start splitting this up into chunks and move towards kind-of a decorator-less API.

Proposed solution

I think we do these in the sequence posted below (ordered in increasing complexity)

  1. Allow users to pass with_job and cmd through the operation decorator.
  2. Allow users to pass operation_aggregator/group_aggregator through operation decorator / make_group
  3. Allow users to pass operation level directives through the operation decorator. This involves dropping support of with_directives for operations in the process.
  4. Allow users to pass directives through the make_group. This involves dropping support of with_directives for groups in the process.
  5. Allow users to pass groups through the operation decorators.
  6. Drop support of all the decorators above 🗡️

I might have missed a few, please feel free to add this in the description itself.

b-butler commented 2 years ago

We discussed the proposed changes in the developer meeting today (2022-08-24), and agreed on 1, 2, and 3. We disagreed with 4, preferring to remove group.with_directives in favor of a keyword directives in the __call__ method. We also disagreed on 5, preferring to keep group registration distinct from operation creation.

The disagreements from the proposed API (primarily 4) in part stem from the burden of having to be able to create a full mapping of directives of a group at instantiation which for dynamically generated operations is unduly burdensome. This makes 5 impossible to cleanly do then as specifying both groups associated with an operation and the group's operation directives is clunky.

Below is an example of the modified proposed API changes.


@group(directives={"walltime": 2.0})
@FlowProject.operation(
    with_job=True,
    aggregator=agg_func,
    directives={"walltime": 1.0}
)
def op(job):
    pass
b-butler commented 1 year ago

Also what are people's thoughts on keeping pre/postconditions as decorators? The two options as I see it are

@FlowProject.pre.true("run_op")
@FlowProject.operation(directives={...}, with_job=True):
    pass

or

@FlowProject.operation(directives={...}, with_job=True, pre=[FlowProject.pre.true("run_op")]):
    pass

Elsewhere, we have moved to a single decorator approach which simplifies the logic on our side. I would say the first is more "pretty" in an ascetic sense, but the second follows a declaration (making a function an operation) is definition (defines the parameters of the operation) which is simpler to code and localize. Overall, given our push on @with_job, @cmd, @directives, etc, I vote for the second, but would love to see others' opinions.

vyasr commented 1 year ago

@csadorf It feels like we are slowly coming full circle back to the original signac-flow API, no? :smile: This is looking eerily similar to

class Project(FlowProject):
    def __init__(self):
        self.add_operation(...)

just with different arguments.

Having said that, I agree that option 2 is better. It gives a more declarative and unified API. I think it would also fit better with some of the longer-term refactorings around the interaction of projects and groups.

bdice commented 1 year ago

I am hesitant about the proposed changes to make pre an argument instead of its own decorator. This is a pretty aggressive change to make in a vacuum unless users are clamoring for it. Most of our "decorator removals" in recent past have been required for simplifying internals and preventing users from causing various bugs from decorator ordering. However, we're getting stricter about enforcing decorator ordering, which should eliminate an entire class of these bugs. Decorators are quite expressive, composable, and concise in my opinion, if we can avoid the problems we faced in the past with invalid orderings.

That said, I understand there is some good motivation from the changes to with_job, directives, etc. I am realizing that this new API (introduced over the course of several versions each with breaking changes) is a totally different structure for defining workflows than what I used for my research 2 years ago.

csadorf commented 1 year ago

I agree with @bdice here. I don't clearly see the user-benefit to moving all arguments and options into a single decorator.

The current API clearly delineates the description of the workflow logic from the description of the execution environment. Operations(-group) decorators declare a function to be part of the worklow and allow for the specification of the execution environment. The direct association of operation-decorators and directives therefor makes sense. The pre- and post-conditions on the other hand describe the worklow logic. This separation allows users to very easily describe and understand their workflow and then declare multiple execution environments for those operations.

Further, and for the reason I just explained, I would assume that in the vast majority of cases the conditions would be similar even for operation-functions that are part of multiple operation-groups. Forcing users to declare conditions as part of the operation-decorators would lead to substantial duplication. Think about it this way, most user workflows would probably work perfectly fine if the @operation decorator was completely implicit. Only @pre and @post decorators would likely be enough in the majority of cases. The same cannot be said the other way round.

Last, but not least, the argument "there should only be one way to do one thing" overlooks that the need to use special conditions for special execution environments or sub-workflows is not exactly the same thing as describing the general workflow logic and should therefor be considered a special or advanced case. And even if it was equivalent, it would appear as an overzealous application of this general guideline to me.

I think we would lose one of the most appealing elements of the current API by making this change and I do not support this unless someone can provide a very compelling argument.

b-butler commented 1 year ago

Thanks for the thoughts @bdice and @csadorf. The most compelling two arguments I can think of for the new syntax would be

  1. For new, users I think the API is more clear. There is one main function that integrates a function into a project. I think of projects like numba with its various options in a decorator. Decorators are composable and nicely so as @bdice points out. That is, however, primarily for use in cases where the functionality is disparate. Here we are more registering an operation, though as @csadorf points out this could be considered as two separate parts of the essence of a workflow.
  2. Not even needing to worry about decorator ordering or future bugs in this vein, as well as the simplicity of a single central location of truth for operation defining, is a maintenance good. While I don't have particular difficulties with the code as is, making the code such that a new developer could go to a single place to search for bugs of a certain kind is likely to be advantageous to a project that has most of its maintainers taking steps back from actual development of the project in various degrees either now or in the near future.

To address @csadorf comments specifically

The current API clearly delineates the description of the workflow logic from the description of the execution environment. Operations(-group) decorators declare a function to be part of the worklow and allow for the specification of the execution environment. The direct association of operation-decorators and directives therefor makes sense. The pre- and post-conditions on the other hand describe the worklow logic. This separation allows users to very easily describe and understand their workflow and then declare multiple execution environments for those operations.

I am not so certain that workflow logic and execution are meaningful to separate. They are certainly different, but I am not sure that warrants their separation, and I fail to see how this makes defining multiple execution environments difficult (unless using some class hierarchy which separates logic from execution).

Further, and for the reason I just explained, I would assume that in the vast majority of cases the conditions would be similar even for operation-functions that are part of multiple operation-groups. Forcing users to declare conditions as part of the operation-decorators would lead to substantial duplication. Think about it this way, most user workflows would probably work perfectly fine if the @operation decorator was completely implicit. Only @pre and @post decorators would likely be enough in the majority of cases. The same cannot be said the other way round.

The ways groups are designed I feel this is a non-issue. No duplication would occur if using a single project class with multiple execution groups.

Last, but not least, the argument "there should only be one way to do one thing" overlooks that the need to use special conditions for special execution environments or sub-workflows is not exactly the same thing as describing the general workflow logic and should therefor be considered a special or advanced case. And even if it was equivalent, it would appear as an overzealous application of this general guideline to me.

I agree that multiple execution environments is relatively advanced. However, I don't think this makes the general case any more difficult than the current API. Now if by this you are referring to conditions changing with execution environments then yes the new API would result in users having to make lists of conditions based on environment rather than simply composing decorators based on environments. Even here though, I would argue a new Python user would likely find creating condition lists more intuitive then using a decorator as a standard function.

I think the nature of the break (essentially every modern script will be broken) is a very strong counter point to the arguments I give above though. As I said in the original comments though, I am not trying to see my dog win necessarily, but I wanted to start this discussion whatever the outcome may be.

csadorf commented 1 year ago

I am not so certain that workflow logic and execution are meaningful to separate. They are certainly different, but I am not sure that warrants their separation, and I fail to see how this makes defining multiple execution environments difficult (unless using some class hierarchy which separates logic from execution).

I would like to stress that the separation of workflow logic and execution environment is one of the main innovation drivers in this space. Basically any modern workflow orchestrator, whether that is dask, or PREFECT, or Metaflow will aim to separate the description of the workflow logic and its execution via worker units. Ideally, a researcher should not need to worry about the execution environment at all (local, cloud, Kubernetes cluster, etc.).

Signac is actually very good at this. You only have to worry about the execution environment when you start running on HPC clusters and then you need to provide only the information that these HPC systems simply cannot infer.

b-butler commented 1 year ago

I guess what I mean is what is meaningfully different in the separation of execution details from

@FlowProject.operation(pre=..., post=...)
def op(job):
    pass

@FlowProject.post.true("bar")
@FlowProject.pre.true("foo")
@FlowProject.operation()
def op(job):
    pass

We do allow the simultaneous specification of both the execution environment and logic, but we do not mix them internally so to speak. The question is this a useful distinction for the user to be exposed to. Through groups we allow for other execution environments outside the default, but explicitly do not allow for different logic.

vyasr commented 1 year ago

I see a lot of good points being made here. I don't have a strong opinion regarding the aesthetics of the API that @bdice mentioned, but I do agree with @csadorf that separating execution environments from workflow definition is an area where flow stands out and we don't want to lose that. I am not necessarily convinced that moving pre/post into the operation declaration breaks that separation in a meaningful way (mostly because I think the way everything is tied into the FlowProject already blurs these lines to a significant extent, but that is a topic for a separate discussion), but I am also not convinced that the benefit of the change is large enough to overcome the heavy reservations expressed in this thread.

@bdice's comment on simplifying internals is my bigger focus. As I have stated repeatedly to various degrees, overemphasizing API stability hinders the long-term sustainability of flow at present. I think others agree, which has led to a relatively fast deprecation cycle in recent releases. I anticipate further large changes to flow internals as we refactor, and I expect that those will lead to API breaks around operation declaration anyway. Given the current state of this discussion, my inclination is to table changes to pre/post until we see how other refactorings fall out and take it from there.

csadorf commented 1 year ago

The question is this a useful distinction for the user to be exposed to.

Then answer is a very clear yes! This is a very important distinction for the user. Again, a user should be enabled to declare workflow logic and execution environment independently. Having the ability to compartmentalize these two problems changes how you are able to think and approach the problem and is not purely aesthetic. We have always claimed that one of the main benefits of using signac is that of reproducibility. Being able to read and understand a signac workflow without getting distracted by the execution conditions is a key driver for that.

I am not necessarily convinced that moving pre/post into the operation declaration breaks that separation in a meaningful way (mostly because I think the way everything is tied into the FlowProject already blurs these lines to a significant extent [...].

If there are areas in the flow API where that separation is not or only poorly upheld, then that would be an area for improvement, not an argument against maintaining said separation.

As I have stated repeatedly to various degrees, overemphasizing API stability hinders the long-term sustainability of flow at present.

This is not a question of API stability IMO. I am not against this change because it is breaking, I am against this change, because I think it regresses our API and reduces the user experience.

If you want to make any meaningful change here that also reduces the implementation complexity, I could get on-board with a full separation of workflow logic and execution directives where each of those two things are declared as two separate decorators (@logic and @directives). But such a change would be the literal opposite of what is proposed here and I am further not convinced that moving all conditions into a single decorator is that much easier to implement compared to multiple ones.

b-butler commented 1 year ago

Given the strength of oppositions, I say we keep everything as is. I will get to crafting a release then.

vyasr commented 1 year ago

@csadorf sorry if I wasn't clear enough, I don't think I properly stated my position. I am agreeing with pretty much everything that you're saying, I just wanted to make sure that everyone was on the same page regarding

This is not a question of API stability IMO. I am not against this change because it is breaking,

I wanted to clarify that the highest priorities for me with regards to the broader point of this issue (revising the signac-flow API) are: 1) Making any breaking changes needed to help reduce implementation complexity, remove bugs, and improve project sustainability, and 2) Stabilizing the core APIs that users are accustomed to as long as they don't actively hinder the simplifications in (1)

My main point was that I do not think we should be putting much effort into API changes that are primarily aimed at improving the user experience. We should prioritize API stability because I think it's good for the project to remain in a state where multiple maintainers are still familiar with the current APIs and usage patterns, but not at the expense of not being able to simplify the code in ways that improve long-term maintainability. Specifically, I would support changes like removing support for FlowProject inheritance for workflow composition since FlowGroups can do the same thing, or moving more of the submission/execution logic out of FlowProject into FlowGroup to better separate workflow logic from input data. I think those types of changes could substantially simplify our internals while only affecting a tiny fraction of users.

csadorf commented 1 year ago

@b-butler Please do not feel discouraged to argue your position just because there is some opposition. I intentionally waited with my initial response because I wanted to make sure that I actually have arguments against this change rather than just having a reactionary response, because I am personally used to (and thus emotionally attached to) the current API.

@vyasr Thank you for clarifying. Fully in support of any sensible changes that simplify implementation with only very minor impact on the quality of user experience. That said, I believe that a trade-off between API quality and implementation clarity usually hints at an inconsistency in the translation of the user intent to our data and execution models. It means the API either oversimplifies the problem (i.e. lacks information) or does not faithfully represent the problem. This means that in the majority of cases, API quality and implementation clarity should be correlated with each other.

b-butler commented 1 year ago

@csadorf I appreciate the concern :smiley:. I haven't taken it personally, we are just debating opinions. My last comment was more me saying, given a sizable group of opposition you, @cbkerr , and @bdice that it is probably sensible to keep as is for now. It isn't as if your arguments carry no merit. I want to move on the next couple of releases, and given this is a bit more of a divided issue than some of the previous changes, I punting the decision it into the future is the best course of action.

vyasr commented 1 year ago

This means that in the majority of cases, API quality and implementation clarity should be correlated with each other.

I agree with that statement when starting from a blank slate. That does not represent the current state of flow, though. The specific examples I mentioned above are a few of the cases where I think our internal execution model blurs lines between concepts or duplicates functionality. Part of the issue in my view is that the API is too tightly coupled to the CLI, and I think we would be better served by introducing an additional layer between the internals and the Python APIs that implement CLI functionality. I do agree that improving our internal implementation should, in the long term, also improve our API.

csadorf commented 1 year ago

This means that in the majority of cases, API quality and implementation clarity should be correlated with each other.

I agree with that statement when starting from a blank slate. That does not represent the current state of flow, though. The specific examples I mentioned above are a few of the cases where I think our internal execution model blurs lines between concepts or duplicates functionality. Part of the issue in my view is that the API is too tightly coupled to the CLI, and I think we would be better served by introducing an additional layer between the internals and the Python APIs that implement CLI functionality. I do agree that improving our internal implementation should, in the long term, also improve our API.

It seems like you are ultimately agreeing with me. 😁