Closed haixuanTao closed 3 months ago
So the idea of this is that nodes are started manually and then matched by name, am I understanding this correctly? So kind of a hybrid between the current declarative apporach and an "interactive dataflow" feature, where nodes can connect to the daemon dynamically, without being specified in a dataflow YAML file.
I'm not sure if detached
is the best name for this. To me, the term "detach" is associated with detached threads, which keep running in the background without a join handle. So at first, I thought that detaching a node keeps it running even after the rest of the dataflow stops. Perhaps manual_start
(or similar) is a better name for that flag?
Another things that I don't understand yet: Are the build
and source
keys considered at all for such a detached/manually started node? To me, it would make sense to disallow these keys for manually started nodes to clearly indicate that the matching needs to be done by name.
Yes, basically, you can spawn a new node without using dora start
and the node will be able to connect to the running dataflow by querying the coordinator for the DORA_NODE_CONFIG
.
Additional change enables to not spawn the node as well as not close its output when stopping so that we can spawn multiple time a "detached" node.
In my opinion, this does not break the declarative paradigm as the input and outputs are still specified within the dataflow, just not the spawning.
The build
and source
keys are not used. The idea is that by adding detached
you can quickly debug the dataflow. You can also safely remove detached
and everything still work.
I think we can add ẁarnings
that the source and build key are not going to be used.
The
build
andsource
keys are not used. The idea is that by addingdetached
you can quickly debug the dataflow. You can also safely removedetached
and everything still work.I think we can add
ẁarnings
that the source and build key are not going to be used.
I'm still in favor of throwing an error. Commenting out the build
and source
keys is quick and simple and it makes it more obvious what's happening. Silently ignoring the build
key is a huge footgun because it subtly changes the behavior of dora build
, so you might end up with a long debug debug session where you never apply the changes that you made to a detached node (if you forget to build it manually after dora build
). Ignoring source
is less dangerous IMO, but I can imagine that it still leads to confusion, e.g. if somebody accidentally commits a dataflow with both source
and detached
set, maybe separated by lots of inputs and outputs.
A warning would be better than nothing, but it is easy to miss.
not close its output when stopping so that we can spawn multiple time a "detached" node.
I don't think that this is a good idea because it's a subtle behavior change in a feature intended for debugging. Consider the following situation:
detached
feature and decide to use it for debugging.Side notes, not directly related to this PR:
control
topic, which is then handled by one of the traditional nodes specified in the YAML file.@haixuanTao @heyong4725 What do you think about this?
So, to be more explicit, I have added a filter on the daemon to only wait for nodes that are not detached. So using:
Will gracefully stop the dataflow.
I understand that this could potentially introduce additional edge cases, but it seems to me that being able to start and stop a specific node be more useful than the worries about not being able to debug a downstream node that might not gracefully stop from InputClosed situation.
Side notes, not directly related to this PR:
* I think it would be a good idea to introduce "dynamic nodes" that can connect to a dataflow at runtime. For example, you could spawn a logger node that prints out some specific messages (sent by some node specified in the dataflow). * We could also add "global topics" as a form of eternal output streams. Multiple nodes could publish to a topic and dynamic nodes could publish to them too. Topics could be mapped as inputs. * This gives us new ways to control dataflows at runtime. For example, a CLI could send a control command to a `control` topic, which is then handled by one of the traditional nodes specified in the YAML file.
@haixuanTao @heyong4725 What do you think about this?
Pinging back on this, I hope that this PR is a stepping stone on having more dynamic node indeed to have more freedom to send message to the dataflow.
What about using source <=> path use another special keyword like: interactive
so that source is always required and we don't have to make it optional, and we don't need to introduce another key.
Just to be clear, this PR don't touch at the build script. So anything calling dora build
will be build just as before, so I don't understand how this will be a footgun. Only spawning the process is the logic changed.
Back on the topic of not closing input. I think that it is more intuitive that the node are not stopped because a node is interactive than, that node are stopped after closing the interactive node.
It would look something like:
nodes:
- id: webcam
source: ./webcam.py
inputs:
tick:
source: dora/timer/millis/50
queue_size: 1000
outputs:
- image
- id: object_detection
source: ./object_detection.py
inputs:
image: webcam/image
outputs:
- bbox
- id: plot
source: interactive
inputs:
image: webcam/image
bbox: object_detection/bbox
What about using source <=> path use another special keyword like:
interactive
so that source is always required and we don't have to make it optional, and we don't need to introduce another key.
Good idea, I like it!
Just to be clear, this PR don't touch at the build script. So anything calling
dora build
will be build just as before, so I don't understand how this will be a footgun. Only spawning the process is the logic changed.
Ah, then disregard my concern. I misinterpreted your "The build
and source
keys are not used." comment above.
FYI @phil-opp , I made a new init_flexible
within Rust which is the default implementation for Python.
I have removed an assertion on node_id to make it less complicated and always use DORA_NODE_CONFIG
if it exist as it is very likely coming from the daemon.
Would love to have a last round of review to make sure we're on the same page.
Description
This PR makes it possible to run process as a dynamic process to be able to debug, run tests as well as run python file interactively.
Getting started
In the YAML graph
For python
For rust
To run
For python
For rust