dora-rs / dora

DORA (Dataflow-Oriented Robotic Architecture) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dataflow capabilities. Applications are modeled as directed graphs, also referred to as pipelines.
https://dora-rs.ai
Apache License 2.0
1.5k stars 79 forks source link

Run dynamic node #517

Closed haixuanTao closed 3 months ago

haixuanTao commented 4 months ago

Description

This PR makes it possible to run process as a dynamic process to be able to debug, run tests as well as run python file interactively.

Getting started

In the YAML graph

nodes:
  - id: webcam
    source: ./webcam.py
    inputs:
      tick:
        source: dora/timer/millis/50
        queue_size: 1000
    outputs:
      - image

  - id: object_detection
    source: ./object_detection.py
    inputs:
      image: webcam/image
    outputs:
      - bbox

  - id: plot
    source: dynamic
    inputs:
      image: webcam/image
      bbox: object_detection/bbox

For python

# Simply add a node_id when you initialize a node

node = Node("plot")

For rust

        DoraNode::init_from_node_id(None, NodeId::from("rust-sink-detached".to_string()), None)?;

To run

For python

dora start examples/python-dataflow/dataflow_detached.yml --name ci-python-detached
python examples/python-dataflow/plot_detached.py
sleep 5
dora stop --name ci-python-test  --grace-duration 5s

For rust

dora start examples/rust-dataflow/dataflow_detached.yml --name ci-rust-detached
cargo run -p rust-dataflow-example-sink-detached
sleep 5
dora stop --name ci-rust-detached  --grace-duration 5s
phil-opp commented 4 months ago

So the idea of this is that nodes are started manually and then matched by name, am I understanding this correctly? So kind of a hybrid between the current declarative apporach and an "interactive dataflow" feature, where nodes can connect to the daemon dynamically, without being specified in a dataflow YAML file.

I'm not sure if detached is the best name for this. To me, the term "detach" is associated with detached threads, which keep running in the background without a join handle. So at first, I thought that detaching a node keeps it running even after the rest of the dataflow stops. Perhaps manual_start (or similar) is a better name for that flag?

Another things that I don't understand yet: Are the build and source keys considered at all for such a detached/manually started node? To me, it would make sense to disallow these keys for manually started nodes to clearly indicate that the matching needs to be done by name.

haixuanTao commented 4 months ago

Yes, basically, you can spawn a new node without using dora start and the node will be able to connect to the running dataflow by querying the coordinator for the DORA_NODE_CONFIG.

Additional change enables to not spawn the node as well as not close its output when stopping so that we can spawn multiple time a "detached" node.

In my opinion, this does not break the declarative paradigm as the input and outputs are still specified within the dataflow, just not the spawning.

The buildand source keys are not used. The idea is that by adding detached you can quickly debug the dataflow. You can also safely remove detachedand everything still work.

I think we can add ẁarnings that the source and build key are not going to be used.

phil-opp commented 4 months ago

The buildand source keys are not used. The idea is that by adding detached you can quickly debug the dataflow. You can also safely remove detachedand everything still work.

I think we can add ẁarnings that the source and build key are not going to be used.

I'm still in favor of throwing an error. Commenting out the build and source keys is quick and simple and it makes it more obvious what's happening. Silently ignoring the build key is a huge footgun because it subtly changes the behavior of dora build, so you might end up with a long debug debug session where you never apply the changes that you made to a detached node (if you forget to build it manually after dora build). Ignoring source is less dangerous IMO, but I can imagine that it still leads to confusion, e.g. if somebody accidentally commits a dataflow with both source and detached set, maybe separated by lots of inputs and outputs.

A warning would be better than nothing, but it is easy to miss.

phil-opp commented 4 months ago

not close its output when stopping so that we can spawn multiple time a "detached" node.

I don't think that this is a good idea because it's a subtle behavior change in a feature intended for debugging. Consider the following situation:

phil-opp commented 4 months ago

Side notes, not directly related to this PR:

@haixuanTao @heyong4725 What do you think about this?

haixuanTao commented 4 months ago

So, to be more explicit, I have added a filter on the daemon to only wait for nodes that are not detached. So using:

Will gracefully stop the dataflow.

I understand that this could potentially introduce additional edge cases, but it seems to me that being able to start and stop a specific node be more useful than the worries about not being able to debug a downstream node that might not gracefully stop from InputClosed situation.

haixuanTao commented 4 months ago

Side notes, not directly related to this PR:

* I think it would be a good idea to introduce "dynamic nodes" that can connect to a dataflow at runtime. For example, you could spawn a logger node that prints out some specific messages (sent by some node specified in the dataflow).

* We could also add "global topics" as a form of eternal output streams. Multiple nodes could publish to a topic and dynamic nodes could publish to them too. Topics could be mapped as inputs.

  * This gives us new ways to control dataflows at runtime. For example, a CLI could send a control command to a `control` topic, which is then handled by one of the traditional nodes specified in the YAML file.

@haixuanTao @heyong4725 What do you think about this?

Pinging back on this, I hope that this PR is a stepping stone on having more dynamic node indeed to have more freedom to send message to the dataflow.

haixuanTao commented 4 months ago

What about using source <=> path use another special keyword like: interactive so that source is always required and we don't have to make it optional, and we don't need to introduce another key.

Just to be clear, this PR don't touch at the build script. So anything calling dora build will be build just as before, so I don't understand how this will be a footgun. Only spawning the process is the logic changed.

Back on the topic of not closing input. I think that it is more intuitive that the node are not stopped because a node is interactive than, that node are stopped after closing the interactive node.

haixuanTao commented 4 months ago

It would look something like:

nodes:
  - id: webcam
    source: ./webcam.py
    inputs:
      tick:
        source: dora/timer/millis/50
        queue_size: 1000
    outputs:
      - image

  - id: object_detection
    source: ./object_detection.py
    inputs:
      image: webcam/image
    outputs:
      - bbox

  - id: plot
    source: interactive
    inputs:
      image: webcam/image
      bbox: object_detection/bbox
phil-opp commented 4 months ago

What about using source <=> path use another special keyword like: interactive so that source is always required and we don't have to make it optional, and we don't need to introduce another key.

Good idea, I like it!

Just to be clear, this PR don't touch at the build script. So anything calling dora build will be build just as before, so I don't understand how this will be a footgun. Only spawning the process is the logic changed.

Ah, then disregard my concern. I misinterpreted your "The buildand source keys are not used." comment above.

haixuanTao commented 3 months ago

FYI @phil-opp , I made a new init_flexible within Rust which is the default implementation for Python.

I have removed an assertion on node_id to make it less complicated and always use DORA_NODE_CONFIG if it exist as it is very likely coming from the daemon.

Would love to have a last round of review to make sure we're on the same page.