dora-rs / dora

DORA (Dataflow-Oriented Robotic Application) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dataflow capabilities. Applications are modeled as directed graphs, also referred to as pipelines.
https://dora-rs.ai
Apache License 2.0
1.36k stars 69 forks source link

Configurable bind addrs #471

Closed Michael-J-Ward closed 2 months ago

Michael-J-Ward commented 2 months ago

Changelog

The CLI provides the defaults, so this shouldn't break any existing run commands.

Notes

~cargo fmt --all makes a lot of unrelated changes. Do you have a rustfmt config somewhere that I can use? Similarly, cargo clippy gives a fair number of lints.~

~If you'd like, let me know if you want me to include those - either in addition to this or as a separate PR.~

EDIT: I realized my own rustfmt was running off a custom config.

Ref #459

Michael-J-Ward commented 2 months ago

Rebased off main and applied the appropriate rustfmt to the commits.

Michael-J-Ward commented 2 months ago

Multi-daemon example breaks on Windows but not on Linux.

2024-04-16T16:10:17.406865Z  WARN dora_daemon::node_communication: failed to send event to daemon
failed to run dora-daemon: failed to forward output to remote receivers: failed to connect to machine `B`: failed to connect: The requested address is not valid in its context. (os error 10049)

...

 2024-04-16T16:10:17.412084Z ERROR dora_coordinator::listener: Os { code: 10054, kind: ConnectionReset, message: "An existing connection was forcibly closed by the remote host." }
    at binaries\coordinator\src\listener.rs:33

@phil-opp did you update those logs in an attempt to debug?

Michael-J-Ward commented 2 months ago

Ahh - 0.0.0.0 is not connectable by clients on windows, whereas linux routes it to your localhost.

Fixing.

Michael-J-Ward commented 2 months ago

Daemon::run uses its configuable address to construct register its listen_socket with the coordinator in coordinator_messages::CoordinatorRequest::Register.

If a window's users configures that to 127.0.0.1 or <vpn-ip>, that works fine.

But if a Window's users configures it to 0.0.0.0, then the daemon listen_socket can't actually be connected to as an InterDaemonConnection.

phil-opp commented 2 months ago

@phil-opp did you update those logs in an attempt to debug?

Yes, but I think it's also an improvement that we might want to keep.

Daemon::run uses its configuable address to construct register its listen_socket with the coordinator in coordinator_messages::CoordinatorRequest::Register.

If a window's users configures that to 127.0.0.1 or <vpn-ip>, that works fine.

But if a Window's users configures it to 0.0.0.0, then the daemon listen_socket can't actually be connected to as an InterDaemonConnection.

Thanks a lot for debugging this! This is indeed the issue. However, I think the correct fix is to use the public IP instead of the local bind address for the registration. One way to get the public IP of the daemon could be to take the connection.peer_addr().ip() of the incoming connection. The Register message then only needs to tell us the (public) port number, on which the daemon listens for incoming inter-daemon connections.

I opened a PR against your PR to implement this change: https://github.com/Michael-J-Ward/dora/pull/1. I also tested this change on our CI in https://github.com/dora-rs/dora/pull/473.

Michael-J-Ward commented 2 months ago

Ready to go.

One follow-on item raised by @haixuanTao: a proper "distributed-test" example where the daemon's run separately.