Open miloszbednarzak opened 4 years ago
@miloszbednarzak I am curious what would be the use case for the first graph? The reason we lean for graphs of later nature is that it's easier to reason about their execution characteristics.
@savingoyal Let's say that functions in steps a1
and b1
outputs variables of kind one
, and a2
,b2
accordingly. I wanted in step a1b1
and a2b2
to group all data coming from input branches by their kinds.
So if the first option was possible I could do something like that:
# in a1b1 step
ones_data = [input.one_kind for input in inputs]
# in a2b2 step
twos_data = [input.two_kind for input in inputs]
In second, legitimate variant I need to group them in as_bs
step like this:
ones_data = [self.a1_data, self.b1_data]
twos_data = [self.a2_data, self.b2_data]
In this toy example there is not much difference, but my actual graph is much more complex and I'd like to group data by kind not by using their artefact variable name in grouping operation, but by pointing output to the step which will group all inputs automatically.
Makes sense. We are discussing a number of enhancements to the flow structure, including introducing the notion of sub workflows, but the timing is TBD.
@savingoyal : Are you guys planning to add dynamic DAG support ?
@valayDave The exact implementation, as well as UX for sub-workflows, is TBD.
I'd like to implement flow in which its branches intersects with each other like:![test-graph](https://user-images.githubusercontent.com/33376065/79224137-e73e5600-7e5a-11ea-8826-29910d99f6b8.png)
Code:
```python from metaflow import FlowSpec, step class TestFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) @step def a(self): self.next(self.a1, self.a2) @step def a1(self): self.next(self.a1b1) @step def a2(self): self.next(self.a2b2) @step def b(self): self.next(self.b1, self.b2) @step def b1(self): self.next(self.a1b1) @step def b2(self,): self.next(self.a2b2) @step def a1b1(self, inputs): self.merge_artifacts(inputs) self.next(self.join) @step def a2b2(self, inputs): self.merge_artifacts(inputs) self.next(self.join) @step def join(self, inputs): self.merge_artifacts(inputs) self.next(self.end) @step def end(self): pass if __name__ == "__main__": TestFlow() ```It throws error:
I know I can reimplement this like:![test-graph2](https://user-images.githubusercontent.com/33376065/79224903-2d47e980-7e5c-11ea-99da-d55c3fb65efa.png)
Code:
```python from metaflow import FlowSpec, step class TestFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) @step def a(self): self.next(self.a1, self.a2) @step def a1(self): self.next(self.a12) @step def a2(self): self.next(self.a12) @step def b(self): self.next(self.b1, self.b2) @step def b1(self): self.next(self.b12) @step def b2(self,): self.next(self.b12) @step def a12(self, inputs): self.merge_artifacts(inputs) self.next(self.as_bs) @step def b12(self, inputs): self.merge_artifacts(inputs) self.next(self.as_bs) @step def as_bs(self, inputs): self.merge_artifacts(inputs) self.next(self.end) @step def end(self): pass if __name__ == "__main__": TestFlow() ```In my case looking from a perspective of visual graph second implementation looks cleaner, but from implementation perspective it brings unnecessary additional steps.
Are there any plans to add support for this kind of Flows? Thanks!