Execute trained ppo-agent error

liuqi8827 commented 4 years ago

Hi,

Thanks for your great work. There are some problems when I test the trained model. 1.I trained the model successfully. 2.However, when I execute trained ppo-agent the error showed in terminal is: Failed to call stepsimulation service 3.Then when I run " roslaunch rl_agent run_ppo2_agent.launch mode:="train" " The error showed in terminal is: File "/home/hitsz/drl_local_planner_ws/src/drl_local_planner_ros_stable_baselines/rl_agent/scripts/run_scripts/run_ppo.py", line 152, in num_stacks=int(sys.argv[8])) ValueError: invalid literal for int() with base 10: '__name:=run_ppo.py' 4.Then I modified the run_ppo2_agent.launch from "node pkg="rl_agent" type="run_ppo.py" name="run_ppo.py" output="screen" args="ppo2_foo CnnPolicy_multi_input_vel2 $(arg mode) 1 0 1 ped" " to "node pkg="rl_agent" type="run_ppo.py" name="run_ppo.py" output="screen" args="ppo2_foo CnnPolicy_multi_input_vel2 $(arg mode) 1 0 1 ped 4" /" The error showed in 3 is solved successfully. 5.However, the other error happened in the "src/drl_local_planner_ros_stable_baselines/rl_agent/src/rl_agent/env_wrapper/ros_env_disc_img_vel.py" line 36 img_width = rospy.get_param("%s/rl_agent/img_width_pos" % ns) + rospy.get_param("%s/rl_agent/img_width_neg" % ns) The error showed in the terminal is: raise KeyError(key) KeyError: 'sim1/rl_agent/img_width_pos'

Do you saw these problems? Can you give me some suggestions?

Thanks a lot!

liuqi8827 commented 4 years ago

2.However, when I execute trained ppo-agent after the command "roslaunch rl_bringup setup.launch ns:="sim1" rl_params:="rl_params_scan" " The error showed in terminal is: Failed to call stepsimulation service

RGring commented 4 years ago

Did you train the agent using the steps in section Example Usage.1. Train agent? Or did you do some custom training/modified some parameters? The default parameters are: train_agent_ppo2(config, agent_name, gamma=0.99, n_steps=128, ent_coef=0.005, learning_rate=0.00025, cliprange=0.2, total_timesteps=10000000, policy="CNN1DPolicy_multi_input", num_envs=num_envs, nminibatches=1, noptepochs=1, debug=True, rew_fnc = 19, num_stacks= 3, stack_offset=5, disc_action_space=False, robot_radius = robot_radius, stage=stage, pretrained_model_name="ppo2_foo", task_mode="ped")

Since ros_env_disc_img_vel.py is accessed, I guess you use another policy using images instead of laserscans?

In that case you have to start the simulation with the image parameters: "roslaunch rl_bringup setup.launch ns:="sim1" rl_params:="rl_params_img_dyn". Make sure that you have the same image configuration in rl_params_img_dyn.yaml as you had during training.

liuqi8827 commented 4 years ago

Thanks for your quick reply. Yes, I trained the agent using the steps in section Example Usage. 1. Train agent.
1.1Open first terminal (roscore): roscore 1.2Open second terminal (simulationI: roslaunch rl_bringup setup.launch ns:="sim1" rl_params:="rl_params_scan" 1.3Open third terminal (DRL-agent): source /bin/activate python rl_agent/scripts/train_scripts/train_ppo.py 1.4Open fourth terminal (Visualization): roslaunch rl_bringup rviz.launch ns:="sim1" 1.5The parameters are the default parameters: train_agent_ppo2(config, agent_name, gamma=0.99, n_steps=128, ent_coef=0.005, learning_rate=0.00025, cliprange=0.2, total_timesteps=10000000, policy="CNN1DPolicy_multi_input", num_envs=num_envs, nminibatches=1, noptepochs=1, debug=True, rew_fnc = 19, num_stacks= 3, stack_offset=5, disc_action_space=False, robot_radius = robot_radius, stage=stage, pretrained_model_name="ppo2_foo", task_mode="ped")

After the four commands and the long train process, I got the database folder which contains /database/agents/ppo2_foo/ppo2_foo.pkl and /database/agents/ppo2_foo/ppo2_foo_stage_0.zip. I rename ppo2_foo_stage_0.zip to ppo2_foo.zip

2.Execute trained ppo-agent 2.1Open first terminal: roscore 2.2Open second terminal: roslaunch rl_bringup setup.launch ns:="sim1" rl_params:="rl_params_scan" The error showed in terminal is: Failed to call stepsimulation service 2.3Open third terminal: source /venv_p3/bin/activate roslaunch rl_agent run_ppo2_agent.launch mode:="train" The error showed in terminal is: File "/home/hitsz/drl_local_planner_ws/src/drl_local_planner_ros_stable_baselines/rl_agent/ scripts/run_scripts/run_ppo.py", line 152, in num_stacks=int(sys.argv[8])) ValueError: invalid literal for int() with base 10: '__name:=run_ppo.py' 2.4Then I modified the run_ppo2_agent.launch from "node pkg="rl_agent" type="run_ppo.py" name="run_ppo.py" output="screen" args="ppo2_foo CnnPolicy_multi_input_vel2 $(arg mode) 1 0 1 ped" " to "node pkg="rl_agent" type="run_ppo.py" name="run_ppo.py" output="screen" args="ppo2_foo CnnPolicy_multi_input_vel2 $(arg mode) 1 0 1 ped 4" /" The error showed in 3 is solved successfully. 2.5However, the other error happened in the "src/drl_local_planner_ros_stable_baselines/rl_agent/src/rl_agent/env_wrapper/ ros_env_disc_img_vel.py" line 36 img_width = rospy.get_param("%s/rl_agent/img_width_pos" % ns) + rospy.get_param("%s/rl_agent/img_width_neg" % ns) The error showed in the terminal is: File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/client.py", line 465, in get_param return _param_server[param_name] #MasterProxy does all the magic for us File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/msproxy.py", line 123, in getitem raise KeyError(key) KeyError: 'sim1/rl_agent/img_width_pos' 2.6I had follow your Installation commands to install the repository. I also had run the commands as Setup virtual environment to be able to use python3 with ros But sometimes the code still use python2.7. In order to solve this problem, I add import sys sys.path.remove('/opt/ros/kinetic/lib/python2.7/dist-packages') sys.path.append('/home/hitsz/.environments/drl_local_planner/lib/python3.5') in some file to solve the version problem of python

I didn't modify any other code. In the github issue(https://github.com/RGring/drl_local_planner_ros_stable_baselines/issues/2), I kown the other person had run your repository successfully. However, I met these problems

Thanks a lot!

RGring commented 4 years ago

Thank your for you detailed description of the error. I optimized the training speed, integrated it in docker and cleaned the repository. So I made some bigger changes. Unfortunately I forgot to update the run_ppo.py scripts. I resolved the bug and hope that you can run your agent now! Please test it and give feedback.

Some hints regarding training speed. For the future it would make sense to train in parallel simulation environments ("sim1", "sim2", "sim3"). For a easy setup, I refer to the docker section.

liuqi8827 commented 4 years ago

Thanks for your quick reply.

Yes, it works!

Thanks a lot!

RGring / drl_local_planner_ros_stable_baselines

Execute trained ppo-agent error #5