Open haixuanTao opened 1 year ago
I would expect the cli to pass the latest value to the daemon and coordinator.
We need to keep deployment to remote machines in mind. We probably don't want the CLI to pass the local env variables to the remote deploy machine.
We already support setting environment variables through the dataflow yaml file, right? Would this be a possible alternative for your use case?
So this is the thing, when we use environment variable within the YAML Graph, it uses old env variables.
In the case we think that environment variables should not be shared in a distributed environment which I can understand, maybe we should change our approach to using variables.
One alternative would be to add a --env
argument that enables user to pass env variable when running dataflows, in the manner of docker
Ah, understood! So I guess the main issue is the env
expansion that we introduced in https://github.com/dora-rs/dora/commit/933dadc68bccc9e3fddd8030ee7c9fc68260ef7c. For some cases (such as the $HOME
example in the commit description) we want that the expansion happens on the target machine, but in other cases we want it to happen on the CLI machine.
I'm fine with changing the behavior to do the env expansion on the CLI machine already, but I fear that there is always some chance of confusion, depending on what you expect. Not sure what we can do to avoid this though, aside from documenting it clearly....
Hello! I believe this unresolved issue is crucial for the better adoption of Dora, especially among less experienced users. At first glance, one might expect that our nodes and the Daemon would have access to the CLI's environment variables. However, it's completely understandable that this could lead to a lot of issues. I think there's a way to find an alternative solution.
build
ProcessWe already have a build
process that can be assigned to nodes, which activates when we run dora build
. This procedure is particularly useful when we need to install requirements or compile a program as it's usually slow.
preprocess
ProcedureIt could be interesting to introduce another procedure called preprocess
, which would contain a list of scripts to execute when running dora start
, before trying to resolve and launch nodes. This would help enhance the environment of our dataflow or specific nodes. For example:
preprocess:
- id: activate_python
path: .venv/bin/activate.sh # Will activate a Python environment
- id: load_env_var_ext
path: .my_env_var.sh # Will export useful custom environment variables
nodes:
- id: camera
path: camera.py # will be launched by the python env, regardless where the daemon or CLI was spawned
- id: plot
path: plot.py # same
If we need different environment variables for a particular node, we could add them like this:
preprocess:
- id: activate_python
path: .venv/bin/activate.sh
- id: load_env_var_ext
path: .my_env_var.sh
nodes:
- id: camera
path: camera.py
- id: plot
path: plot.py
- id: weird_script
preprocess:
- id: activate_another_python
path: .another_venv/bin/activate.sh
path: weird_script.py # will be launched by this other_python
In the case of a distributed dataflow, absolute paths will need to be specified instead of relative ones, just like with everything else.
Let me know what you think! 😊
So I definitely feel there is something to be done with running command before running the full nodes.
I think the preprocess idea is really interesting!
I would maybe do couple of changes.
Maybe what we could do is be slightly more github action like
and make it possible to have multiline shell script in the likes of:
run: |
source /opt/ros/humble/setup.bash
cargo run --example cxx-ros2-dataflow --features="ros2-examples"
It would make it a bit more readable as people don't need to search for the shell script and also make it more easy to get started with if they used github action before.
Currently we should be able to ( or manage to ) do something like this:
- id: terminal-print
build: cargo build -p terminal-print
path: shell
args: |
source /opt/ros/humble/setup.bash
cargo run --example cxx-ros2-dataflow --features="ros2-examples"
What do you think of having something like this?
P.S: the preprocess should probably be per node as it make dependant step easier to track / conflict.
I'm okay with using multi-line commands inside the run
parameter since filling both the path
and args
parameters may not be easy :)
In all honesty, I'm drawn with using run
as well as it is closer to github action, but I think that it makes it harder to maintain as it means adding another key
, and knowing that this will not be backward compatible. Also, there might be errors in having both path and run set.
Ok well, that's a good point. So we should stay on this pattern (path + args), documenting it, until there is a real need about something different.
I think one of the issue is that if we only have run, we don't really have a way to control the environment in which the shell itself is spawned with. Like is it using bash or zsh or powershell or cmd, ...
Those things we could manage with ( something like):
- id: terminal-print
build: cargo build -p terminal-print
path: /bin/zsh
args: |
source /opt/ros/humble/setup.bash
cargo run --example cxx-ros2-dataflow --features="ros2-examples"
Describe the bug Sometimes an environment variable is updated before using the dora-cli. But because the coordinator and the daemon are background process. They do not see the updated environment variable. This can generate some confusion.
To Reproduce Steps to reproduce the behavior:
dora up
dora start dataflow.yaml
dora stop
dora destroy
Expected behavior I would expect the cli to pass the latest value to the daemon and coordinator.