ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.22k stars 9.72k forks source link

How did Cyber RT support functional component distributed deployment #7050

Closed Tony301X closed 5 years ago

Tony301X commented 5 years ago

Hi, folks

I want to deploy Apollo's functional components on different host, e.g, Planning and Control modules on host A, Perception module on host B, how will they communicate? In Cyber RT, it has RTPS transport mode, and I wanna know how to configure to support functional components distributed deployment? Please show the detailed steps to carry it out.

Thanks in advance!

System information

fengqikai1414 commented 5 years ago

Hi @Tony301X You just need to modify line 20 of /apollo/cyber/setup.bash

export CYBER_IP=127.0.0.1

Suppose you have two hosts A and B,the ip of A is 192.168.10.6, and the ip of B is 192.168.10.7. Then set CYBER_IP to 192.168.10.6 on host A, and set CYBER_IP to 192.168.10.7 on host B. Now host A can communicate with host B.

Tony301X commented 5 years ago

@fengqikai1414 ,thanks for your reply, I'll try it soon.

Tony301X commented 5 years ago

Hi @fengqikai1414 I used your method, and try tcpdump to capture network traffic, but there's no data flowing.

My operation steps as follows: Prerequisite: I have two host, e.g., A, B, they can be pinged successfully each other.

  1. Modify CYBER_IP in /apollo/cyber/setup.bash to one's own IP address.
  2. On A, I run bash scripts/bootstrap.sh to launch Dreamview in docker env, toggle Sim Contorl on Tasks tab, then on Module Controller tab, I toggle on Routing module.
  3. On B, open a terminal, run sudo tcpdump host B's IP
  4. On B, I run nohup mainboard -p compute_sched -d /apollo/modules/planning/dag/planning.dag & in docker env too, but only to find no packets captured on B.

NB: In another case, I uncommented the commented part in cyber/conf/cyber.pb.conf, originally as below: # transport_conf { # shm_conf { # # "multicast" "condition" # notifier_type: "multicast" # shm_locator { # ip: "239.255.0.100" # port: 8888 # } # } # participant_attr { # lease_duration: 12 # announcement_period: 3 # domain_id_gain: 200 # port_base: 10000 # } # communication_mode { # same_proc: INTRA # diff_proc: SHM # diff_host: RTPS # } # resource_limit { # max_history_depth: 1000 # } # }

run_mode_conf { run_mode: MODE_REALITY }

scheduler_conf { routine_num: 100 default_proc_num: 16 }

But the result is the same. I'll be appreciated if you can provide some more insights about this problem.

fengqikai1414 commented 5 years ago

@Tony301X What is your network topology like? Make sure the two hosts under the same network segment of the local area network, Like 192.168.10.6 and 192.168.10.7.

Tony301X commented 5 years ago

@fengqikai1414 Just as you mentioned, I connected two hosts only via a switch, with their IP configured as 192.168.0.6 and 192.168.0.7 respectively, and they can ping successfully to each other. But the phenomenon is the same, too. BTW, have you done the same experiment (functional module(s) distributed deployment) yet?

Eclipsehelio commented 5 years ago

@Tony301X You execute cyber/setup.bash directly or do it with 'source' ? You can execute the 'talker/listener' examples of Cyber RT on different hosts. Check that they can comminicate with each other. I think this is the simplest way to test Cyber RT's distributed function. By the way, you don't need to source the setup script, but you have to export CYBER_IP environment varaible.

Eclipsehelio commented 5 years ago

@Tony301X I have verified the distributed function on ARM64 platform a month ago. Fast-RTPS has Native support for distribution with no doubt.

fengqikai1414 commented 5 years ago

@Tony301X

BTW, have you done the same experiment (functional module(s) distributed deployment) yet?

Yes, we have used this communication method in multi products, and it works well. RTPS transfer data using UDP, if the hosts under the same network segment of the local area network, under normal circumstances, you can definitely use tcpdump to capture related data. So please check you network again, make sure the hosts can receive data from each other with a UDP communication test program.

Tony301X commented 5 years ago

@Eclipsehelio

You can execute the 'talker/listener' examples of Cyber RT on different hosts. Check that they can comminicate with each other. I think this is the simplest way to test Cyber RT's distributed function. By the way, you don't need to source the setup script, but you have to export CYBER_IP environment varaible.

As you told, I executed the 'talker/listener' examples of Cyber RT on my network topology, and it worked. I'll try the real case more carefully. Thank you very much.

Tony301X commented 5 years ago

@Eclipsehelio I have verified the distributed function on ARM64 platform a month ago. Fast-RTPS has Native support for distribution with no doubt.

Hi, my real scenario is two host located in a local network both running apollo in docker. One running dreamview normally, toggled Sim Control and Routing module, another running planning module with mainboard command. When I tcpdumped it, there's no payload through them. What's more, in routing.INFO log file, there is no planning node subscription information appeared, for topic message RoutingResponse. What your experiment environment looked like?

alexiskovi commented 5 years ago

@Tony301X, same problem here. Did you solve this?

fengliu00 commented 5 years ago

As you mentioned, your two PCs are able to ping each other successfully. Can you try a little program to double check cyber's communication between your two PCs as below:

  1. in both of two docker containers, change CYBER_IP in cyber/setup.bash to the local ip address.
  2. Run "source cyber/setup.bash" on both of PCs
  3. Run "python cyber/python/examples/talker.py" on 1st PC.
  4. Run "python cyber/python/examples/listener.py" on 2nd PC.

Let us know the communication successes or not at your side.

thymbahutymba commented 3 years ago

Hey guys, I'm trying to reproduce the same distributed deployment for apollo branch r6.0.0 for x86_64. I created 2 containers using the bridge docker network and in one of them (the one that runs dreamview) I bind the port 9090 for the communication with the SVL simulator.

In the first container I run export CYBER_IP=172.17.0.11 (which is its IP that I find with ip addr show and checked also with docker network inspect bridge) while in the second container I run export CYBER_IP=172.17.0.12. After that I tried the talker/listener example for both python and cpp and they works fine. Then I tried to start apollo and see whether it was working, therefore in the first container I executed ./scritps/bootstrap_lgsvl.sh and ./scripts/bridge.sh. In the second container I run cyber_launch start modules/localization/launch/localization.launch and nothing happens, I can start the localization from dreamview or with the same command on the first container but not in the second one. I think that the steps that I made are what has been explained in this issue even though it is not working. What I'm missing? Why it is not working? (Ping between the two containers works fine).

cherishTMYY commented 1 year ago

Did you solve this problem later? I also encountered the same problem, but I was running outside the container