TempleRAIL / drl_vo_nav

[T-RO 2023] DRL-VO: Learning to Navigate Through Crowded Dynamic Scenes Using Velocity Obstacles
https://doi.org/10.1109/TRO.2023.3257549
GNU General Public License v3.0
113 stars 7 forks source link

Support on ros2 #3

Closed ConnerQiu closed 1 year ago

ConnerQiu commented 1 year ago

Hi, I just noticed this great work and would like to follow it. However, my workstation and server are both using ubuntu 22.04, which is not officially supported by ros1. I realize that these is some way to build from source in ubuntu 22.04, but I think this would stop many people to use it. So, I would like to ask how to mitigate this repo to support ros2. Is it possible for a starter in ros like me to finish that. Or do you have any plan to upgrade it to support ros2? Thanks~

zzuxzt commented 1 year ago

Hi, thanks for your interest in our work and suggestions for support on ros2. I just did a simple survey about support for ros2 and found out that even the basic turtlebot2 packages don't support ros2. So it might not be a good time to provide support on ros2, but we will provide support on ros2 in the future. Regarding your issue on Ubuntu 22.04, I recommend using Singularity Containers to run and build your code, because sandbox containers provide any environment you need, regardless of your system. I hope that temporarily resolves your concerns.

ConnerQiu commented 1 year ago

Hi zhanteng, thanks for your reply and your time for the survey! In that case, I think using ros1 for now is a more suitable way. We are actually considering downgrade some of our machines to 20.04 haha. Thanks again, I guess we would have a lot to discuss in the future~

zzuxzt commented 1 year ago

Sounds good to me. Anyway, trying ros2 is not a bad thing. Forgot to mention, if you use the ubuntu 20.04 version, ros2 can also be bridged to ros1 using ros1_brigde. If you have any questions in the future, feel free to let me know.

ConnerQiu commented 1 year ago

@zzuxzt Hi Zhanteng, I got serval questions about running the experiment:

  1. Running on server. Now I am using a docker running on server to run the whole project. When I launch the drl_vo_nav_train.py, whether I set the gui:=false, or modify the default value inside the file, it still generate below message, do you have any idea about solving this:
    
    process[rviz-25]: started with pid [27101]
    [pedsim_simulator-10] process has died [pid 26959, exit code -6, cmd /root/ros_projects/drl_vo_ws/devel/lib/pedsim_simulator/pedsim_simulator __name:=pedsim_simulator __log:=/root/.ros/log/8f7fd8cc-e23d-11ed-a62f-0242ac110002/pedsim_simulator-10.log].
    log file: /root/.ros/log/8f7fd8cc-e23d-11ed-a62f-0242ac110002/pedsim_simulator-10*.log
    qt.qpa.xcb: could not connect to display 
    qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
    This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, xcb.

Waiting for gazebo services... [rviz-25] process has died [pid 27101, exit code -6, cmd /opt/ros/noetic/lib/rviz/rviz -d /root/ros_projects/drl_vo_ws/src/robot_gazebo/rviz/navigation.rviz name:=rviz log:=/root/.ros/log/8f7fd8cc-e23d-11ed-a62f-0242ac110002/rviz-25.log]. log file: /root/.ros/log/8f7fd8cc-e23d-11ed-a62f-0242ac110002/rviz-25*.log [WARN] [1682298956.392120, 0.000000]: Could not get robot pose [ INFO] [1682298956.481021334]: Finished loading Gazebo ROS API Plugin.


2. About training with multiple GPUs. In the paper you mentioned that you were training with a laptop and a 8*V100 server, three question about these:
   1) how do you manage to connect these two machines(using telepoted ROS?, but transfering the collected buffer maybe time consuming). 
   2)if there is special consideration to use this setting, since I think using the server to run simulator may be faster. 
   3)how much time does it take of the traing process.

 Thanks~
zzuxzt commented 1 year ago

Hi Ronghe, here are answers to your questions:

  1. The reason for this problem is that your server does not have GUI support, so you need to create a fake GUI to render the Gazebo stuff. So, I updated the code and ReadMe to include bash commands to train the code in the server without the GUI. Downloading " run_drl_vo_policy_training_server.sh " to your workspace and running it should solve this problem:
    wget https://raw.githubusercontent.com/TempleRAIL/drl_vo_nav/drl_vo/run_drl_vo_policy_training_server.sh
    sh run_drl_vo_policy_training_server.sh ~/drl_vo_runs
  2. You may remember wrongly, I only used one GPU on the server (which has 8 GPUs) to train the policy. Since the training time mainly depends on the Gazebo simulator, multiple GPUs do not provide faster training. In my case, it took a week to train the policy due to the slow Gazebo simulator.
ConnerQiu commented 1 year ago

Hi Zhanteng, thanks for your quick update, now the scripts runs smoothly ! Considering the test time, it is really quite long. I realize that Gazebo runs quick slowly, but this time still gets out of my expectation haha. May be we could think about how to shorten the training time in the future. I now may focus on running the baseline for a while. Thanks for your support and may you have a great day~