ethz-asl / rl-navigation

BSD 3-Clause "New" or "Revised" License
63 stars 28 forks source link

Training time issues with the network #4

Closed sg774 closed 4 years ago

sg774 commented 4 years ago

Thank you for this open source implementation. I am currently trying to train this network on my system with default settings in options.py. I am using Titan V GPU for training without any prior IL weight initialization. But currently each epoch is taking more than 1 hour to complete instead of 3 minutes as stated in the research paper. How do I address this issue? Does it have something to do with the value published to /clock by stage_ros?

Weeeesen commented 4 years ago

@sg774 Hi there, I am facing similar issues with long training time, approximately 40 minutes per epoch. Have you managed to solve this issue?

sg774 commented 4 years ago

@sixfeetzero You need to change the rate for stage_ros. In stageros.cpp, navigate to the end of the code, find the declaration for ros::WallRate and adjust it's value. Setting it to 1000, enabled the code to complete around 3-4 epochs per hour on a CPU. Although, the network architecture for my research was much more sophisticated and computationally expensive than the work presented in this repository.

pfmark commented 4 years ago

Sorry for the long wait time. Exactly, you can either change the rate in the stage source code or in the stage GUI when starting the simulation (there you can also select to simulate as fast as possible).

minded-hua commented 4 years ago

@sg774 @kermeed I ran the code but met a problem. The problem is that the loaded robot model can't run within the map. I guess it's caused by that the coordinate system's origin of /odom and /map is mismatching, however I can't solve it. rviz

Did you met the problem? Looking forward to your reply. Thanks.

sg774 commented 4 years ago

@minded-hua add this in the launch file:
<node pkg="tf" type="static_transform_publisher" name="link1_broadcaster" args="0 0 0 0 0 0 map odom 100" />