Closed vmayoral closed 2 years ago
@vmayoral can you try again on the humble branch? I've made a few changes and I'm now able to run the talker example without issue. If you do run into an issue the ssh changes I made should make it easier to find the root cause.
@jeffi, gave it a try again. Failed twice with humble
branch:
Attempt 1 :x: | Attempt 2 ❌ |
---|---|
Here're a few additional observations:
humble
branch is inconsistent from a ROS 2 perspective, it should upon rolling
in here https://github.com/BerkeleyAutomation/FogROS2/blob/humble/Dockerfile#L1, not on top of Galactic (Humble is branching out Rolling). I believe @KDharmarajanDev took care of this in main
.humble
branch many of the changes introduced in main
(sorry I sent them there, I just didn't know that humble
was the actual "good" one, since generally, development happens in main
and then you branch things out when ready, as appropriate). I think it'd be nice to cherry-pick many of the changes introduced in main regarding the README polishing or just rebase things together.main
).We branched humble
since it will undergo many changes that are inconsistent with the paper. We branched recently, so it should be straightforward to merge everything onto the humble
branch. Anything on main that is inconsistent with the paper, we may need to back out. Sorry for not being clear about this earlier.
What are the steps you're taking to reproduce the issues you see? Also are you using an EC2 instance near you? Or us-west-1
?
Been using us-west-1. This is known to work in the past and I would expect bigger latencies, but still interoperability.
Do you guys get issues with machines in Europe?
El El mié, 13 abr 2022 a las 18:44, Jeff Ichnowski @.***> escribió:
We branched humble since it will undergo many changes that are inconsistent with the paper. We branched recently, so it should be straightforward to merge everything onto the humble branch. Anything on main that is inconsistent with the paper, we may need to back out. Sorry for not being clear about this earlier.
What are the steps you're taking to reproduce the issues you see? Also are you using an EC2 instance near you? Or us-west-1?
— Reply to this email directly, view it on GitHub https://github.com/BerkeleyAutomation/FogROS2/issues/19#issuecomment-1098267575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKPYDVWCAIRVOKTBD5JBOTVE32VNANCNFSM5TASODDQ . You are receiving this because you were mentioned.Message ID: @.***>
I think some DDS changes have occurred. So it might be the issue. Can you try with a closer instance? That’s best practice anyways.
I tried connecting from Berkeley to eu-west-1 and didn't run any issues.
I just ran the humble branch with docker, and it works fine on my end
All right folks, gave it yet another try with the humble
branch and failed again ❌ :
git clone --recurse-submodules https://github.com/BerkeleyAutomation/FogROS2 -b humble
cd FogROS2
docker build -t fogros2:latest .
docker run -it --rm --net=host --cap-add=NET_ADMIN fogros2
aws configure
source install/setup.bash
export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp
export CYCLONEDDS_URI=file://$(pwd)/install/share/fogros2/configs/cyclonedds.xml
# edited src/fogros2/fogros2_examples/launch/talker.launch.py, and switched to "eu-west-1", tried various combinations
ros2 launch fogros2_examples talker.launch.py
eu-west-1
❌AMI's not available in this location:
root@xilinx:/home/root/fog_ws# ros2 launch fogros2_examples talker.launch.py
[INFO] [launch]: All log files can be found below /root/.ros/log/2022-04-14-17-18-25-804926-xilinx-709
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [root]: init
using ROS workspace: /home/root/fog_ws
creating EC2 instance
security group id is ['sg-03aff72c987f1fb2b']
[ERROR] [launch]: Caught exception in launch (see debug for traceback): Caught exception when trying to load file of format [py]: An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-09175f2ca3c3dc67c]' does not exist
I'd be interested to know if someone manages to reproduce this in Europe.
us-west-1
❌[INFO] [launch]: All log files can be found below /home/ubuntu/.ros/log/2022-04-14-17-27-39-176746-ip-172-31-20-210-8020
[INFO] [launch]: Default logging verbosity is set to INFO
action added
[INFO] [talker-1]: process started with pid [8232]
[talker-1] 1649957263.505701 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7410 failed with retcode -12
[talker-1] 1649957263.505735 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7412 failed with retcode -12
[talker-1] 1649957263.505751 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7414 failed with retcode -12
[talker-1] 1649957263.505764 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7416 failed with retcode -12
[talker-1] 1649957263.505777 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7418 failed with retcode -12
[talker-1] 1649957263.505790 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7420 failed with retcode -12
[talker-1] 1649957263.505803 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7422 failed with retcode -12
[talker-1] 1649957263.505816 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7424 failed with retcode -12
[talker-1] 1649957263.505840 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7426 failed with retcode -12
[talker-1] 1649957263.605819 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7410 failed with retcode -12
[talker-1] 1649957263.605855 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7412 failed with retcode -12
[talker-1] 1649957263.605869 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7414 failed with retcode -12
[talker-1] 1649957263.605881 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7416 failed with retcode -12
[talker-1] 1649957263.605894 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7418 failed with retcode -12
[talker-1] 1649957263.605906 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7420 failed with retcode -12
[talker-1] 1649957263.605918 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7422 failed with retcode -12
[talker-1] 1649957263.605929 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7424 failed with retcode -12
[talker-1] 1649957263.605942 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7426 failed with retcode -12
[talker-1] [WARN] [1649957264.059897731] [minimal_publisher]: Publishing: "Hello World: 0"
[talker-1] [WARN] [1649957264.547799905] [minimal_publisher]: Publishing: "Hello World: 1"
[talker-1] [WARN] [1649957265.047765659] [minimal_publisher]: Publishing: "Hello World: 2"
[talker-1] [WARN] [1649957265.547783128] [minimal_publisher]: Publishing: "Hello World: 3"
[talker-1] [WARN] [1649957266.047788410] [minimal_publisher]: Publishing: "Hello World: 4"
[talker-1] [WARN] [1649957266.547765959] [minimal_publisher]: Publishing: "Hello World: 5"
[talker-1] [WARN] [1649957267.047778546] [minimal_publisher]: Publishing: "Hello World: 6"
[talker-1] [WARN] [1649957267.547725197] [minimal_publisher]: Publishing: "Hello World: 7"
[talker-1] [WARN] [1649957268.047791400] [minimal_publisher]: Publishing: "Hello World: 8"
[talker-1] [WARN] [1649957268.547799940] [minimal_publisher]: Publishing: "Hello World: 9"
[talker-1] [WARN] [1649957269.047773415] [minimal_publisher]: Publishing: "Hello World: 10"
[talker-1] [WARN] [1649957269.547797331] [minimal_publisher]: Publishing: "Hello World: 11"
[talker-1] [WARN] [1649957270.047731141] [minimal_publisher]: Publishing: "Hello World: 12"
[talker-1] [WARN] [1649957270.547734673] [minimal_publisher]: Publishing: "Hello World: 13"
[talker-1] [WARN] [1649957271.047857520] [minimal_publisher]: Publishing: "Hello World: 14"
[talker-1] [WARN] [1649957271.547791794] [minimal_publisher]: Publishing: "Hello World: 15"
[talker-1] 1649957271.605965 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7410 failed with retcode -12
[talker-1] 1649957271.606008 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7412 failed with retcode -12
[talker-1] 1649957271.606022 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7414 failed with retcode -12
[talker-1] 1649957271.606033 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7416 failed with retcode -12
[talker-1] 1649957271.606045 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7418 failed with retcode -12
[talker-1] 1649957271.606057 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7420 failed with retcode -12
[talker-1] 1649957271.606069 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7422 failed with retcode -12
[talker-1] 1649957271.606081 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7424 failed with retcode -12
[talker-1] 1649957271.606093 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7426 failed with retcode -12
[talker-1] [WARN] [1649957272.047780478] [minimal_publisher]: Publishing: "Hello World: 16"
[talker-1] [WARN] [1649957272.547810903] [minimal_publisher]: Publishing: "Hello World: 17"
[talker-1] [WARN] [1649957273.047775785] [minimal_publisher]: Publishing: "Hello World: 18"
[talker-1] [WARN] [1649957273.547864499] [minimal_publisher]: Publishing: "Hello World: 19"
[talker-1] [WARN] [1649957274.047790936] [minimal_publisher]: Publishing: "Hello World: 20"
[talker-1] [WARN] [1649957274.547792087] [minimal_publisher]: Publishing: "Hello World: 21"
[talker-1] [WARN] [1649957275.047782828] [minimal_publisher]: Publishing: "Hello World: 22"
[talker-1] [WARN] [1649957275.547805925] [minimal_publisher]: Publishing: "Hello World: 23"
[talker-1] [WARN] [1649957276.047801592] [minimal_publisher]: Publishing: "Hello World: 24"
[talker-1] [WARN] [1649957276.547817732] [minimal_publisher]: Publishing: "Hello World: 25"
[talker-1] [WARN] [1649957277.047784636] [minimal_publisher]: Publishing: "Hello World: 26"
[talker-1] [WARN] [1649957277.548027508] [minimal_publisher]: Publishing: "Hello World: 27"
[talker-1] [WARN] [1649957278.047788750] [minimal_publisher]: Publishing: "Hello World: 28"
[talker-1] [WARN] [1649957278.547805068] [minimal_publisher]: Publishing: "Hello World: 29"
[talker-1] [WARN] [1649957279.047773950] [minimal_publisher]: Publishing: "Hello World: 30"
[talker-1] [WARN] [1649957279.547744832] [minimal_publisher]: Publishing: "Hello World: 31"
[talker-1] 1649957279.606107 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7410 failed with retcode -12
[talker-1] 1649957279.606138 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7412 failed with retcode -12
[talker-1] 1649957279.606152 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7414 failed with retcode -12
[talker-1] 1649957279.606163 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7416 failed with retcode -12
[talker-1] 1649957279.606194 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7418 failed with retcode -12
[talker-1] 1649957279.606212 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7420 failed with retcode -12
[talker-1] 1649957279.606224 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7422 failed with retcode -12
[talker-1] 1649957279.606236 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7424 failed with retcode -12
[talker-1] 1649957279.606248 [0] tev: ddsi_udp_conn_write to udp/10.0.0.1:7426 failed with retcode -12
[talker-1] [WARN] [1649957280.047784027] [minimal_publisher]: Publishing: "Hello World: 32"
[talker-1] [WARN] [1649957280.547778110] [minimal_publisher]: Publishing: "Hello World: 33"
[talker-1] [WARN] [1649957281.047787163] [minimal_publisher]: Publishing: "Hello World: 34"
[talker-1] [WARN] [1649957281.547907442] [minimal_publisher]: Publishing: "Hello World: 35"
[talker-1] [WARN] [1649957282.047792322] [minimal_publisher]: Publishing: "Hello World: 36"
[talker-1] [WARN] [1649957282.547796226] [minimal_publisher]: Publishing: "Hello World: 37"
[talker-1] [WARN] [1649957283.047783401] [minimal_publisher]: Publishing: "Hello World: 38"
[talker-1] [WARN] [1649957283.547781548] [minimal_publisher]: Publishing: "Hello World: 39"
I can see there's been some decent cleanup in the humble
branch but from what I can see at #15, we're still missing quite a bit of things to release into Humble. Critical aspects include:
Right now I can see we're duplicating official ROS 2 packages, which will block this from being accepted.
I repeated the steps that @vmayoral mentioned. I had the same problem whenever I start fogros container the first time.
Then my theory is that if the VPN's wg0 interface exists, but /etc/wireguard/wg0 file does not exist(because the container is new), then the interface is not deleted and restarted. Then the setup does not succeed.
The fix is in #22
(Just getting the issue up to date from conversations over Slack and email)
The issue appears to stem from ROS_DOMAIN_ID
. When we set it on the robot, the setting is not transmitted to the cloud instance. The result is that robot and cloud were talking on different port ranges, and thus never to each other. The workaround for now is to leave ROS_DOMAIN_ID
unset on the robot (to match the cloud). The fix will be to match the ROS_DOMAIN_ID
setting on the cloud.
Tried latest changes in humble
in a clean, Docker-bases setup and failed ❌ :
I'll be taking a few days off but will re-engage with the discussion next week!
Superseded by #36
I'm unable to run https://github.com/BerkeleyAutomation/FogROS2/blob/main/fogros2_examples/launch/talker.launch.py successfully and obtain basic interoperability. Neither between edge-cloud, nor between edge-edge.
From my inspection, VPN works just fine but there's something wrong in the setup. ROS 2 CLI daemons aren't able to see simple Nodes launched (manually) across machines in the VPN.
This deserves further attention.