glotzerlab / signac-flow

Workflow management for signac-managed data spaces.
https://signac.io/
BSD 3-Clause "New" or "Revised" License
48 stars 37 forks source link

Detect and automatically execute dependencies of an operation #35

Open csadorf opened 6 years ago

csadorf commented 6 years ago

Original report by Jens Glaser (Bitbucket: jens_glaser, GitHub: jglaser).


For local testing, I would expect that @Project.pre(...) decorators are honored when I try to execute an operation where the precondition is not fulfilled.

E.g. python project.py exec simulate

should evaluate the preconditions of that @Project.operation (simulate) and execute these automatically. Instead, it tries to directly execute the operation in question and fails.

csadorf commented 6 years ago

Original comment by Vyas Ramasubramani (Bitbucket: vramasub, GitHub: vyasr).


Since exec is intended as a simple "just do this" command, I'm not sure you'd want this to be the default, but we could add an option to make this work. @csadorf thoughts?

csadorf commented 6 years ago

Original comment by Jens Glaser (Bitbucket: jens_glaser, GitHub: jglaser).


ignore this. this is meant for issue #36.

@vyasr

Type "help", "copyright", "credits" or "license" for more information.
>>> import flow
>>> print(flow.__file__)
/usr/lib/python3.7/site-packages/flow/__init__.py
>>> print(flow.__version__)
0.6.3
>>> 

on collins

csadorf commented 6 years ago

Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).


Pre- (and post-) conditions are checked when you use run instead of exec, would that solve the issue?

csadorf commented 6 years ago

Original comment by Vyas Ramasubramani (Bitbucket: vramasub, GitHub: vyasr).


I assumed that Jens wanted the preceding operations executed as needed, whereas run just won't run unless the preconditions are met. But in hindsight, the current infrastructure doesn't really support this, since a given operation doesn't know what has to happen to satisfy its preconditions, it just knows what needs to be satisfied.

csadorf commented 6 years ago

Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).


@vyasr I think you are misinterpreting Jens' issue, but maybe you can briefly weigh in @jglaser ?

csadorf commented 6 years ago

Original comment by Jens Glaser (Bitbucket: jens_glaser, GitHub: jglaser).


Vyas interpreted me correctly, maybe I didn't made myself clear initially. I was looking for some automatic dependency resolution. Of course this would mean that you need to traverse that acyclic directed graph, instead of just checking if a node can be executed. This would also lead to simplified job scripts, as instead of the sequence of commands you'd just need to issue the last operation.

csadorf commented 6 years ago

Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).


The operations do not have an explicit linear dependency hierarchy so I would not know how to realize that. If you call project.py run it will execute all eligible operations until there are none left. Is that not what you want?

csadorf commented 6 years ago

Original comment by Jens Glaser (Bitbucket: jens_glaser, GitHub: jglaser).


Just talked with @csadorf . The difference between run is that it iterates over all nodes in the tree and executes them if eligible, but what I'm proposing is slightly more fine grained. Implement a run upto that takes an operation (node) as an argument and automatically satisfies its dependencies. If the tree has more than one leaf, you could choose one the leaves (such as simulate_nvt or simulate_npt) to traverse up from, and satisfy their dependencies, but only theirs, using information about pre and post decorators available. Cyclic dependencies would have to be detected as a user error.

csadorf commented 6 years ago

Original comment by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).


The first step would be to extract the workflow-graph from a given FlowProject definition. I believe that to be possible, albeit not trivial.

Then we can use that graph to determine all missing operations and essentially execute run -o [all-ops-part-of-the-graph-up-until-desired-op].

csadorf commented 5 years ago

@vyasr Can you re-open the DAG pull request which is related to this?

vyasr commented 5 years ago

This is waiting on #71, but should be easy to implement once the graph functionality exists.

vishav1771 commented 4 years ago

I would like to work on this issue.

csadorf commented 4 years ago

@vishav1771 This issues is reserved as a potential Google Summer of Code project so we can't assign it at the moment. However, would you be willing to help us with https://github.com/glotzerlab/signac-flow/pull/189 ?

vishav1771 commented 4 years ago

@csadorf Sure, I'd love to help you with it. Can you please specify how can I help?

csadorf commented 4 years ago

@vishav1771 Here is how you could help:

  1. Familiarize yourself with the feature by reading the code and testing the examples locally.
  2. Identify issues with the design/API and make suggestions as to how to improve the feature.
  3. Read through the current PR comments and collect open action items and questions. I believe there should not be very many.
  4. Fix any remaining and potentially upcoming issues that are identified as part of the review.
  5. Implement unit tests and write basic documentation.

Note: Helping us with any or all of the above items would be great. What I mean is that if you could provide some feedback to the current design even if you don't get to the next steps would already be helpful. However, I would suggest to follow the order, i.e., implementing unit tests probably does not make any sense until you have familiarized yourself with the feature.

csadorf commented 4 years ago

I believe this has been resolved with the merge of #209 .

bdice commented 4 years ago

Discussed with @csadorf offline - this issue is different than the one resolved by #209. This suggested feature is about automatic execution of dependent operations (from the graph detected by preconditions), not ignoring preconditions as handled in #209.

vyasr commented 4 years ago

@csadorf completing this feature requires actually making use of the operation graph to execute dependencies so that preconditions of a particular operation are satisfied. #209 simply makes it possible to run even if preconditions are not satisfied (among other things) That being said, I'm OK leaving this closed for now and reopening when someone has bandwidth.