Question : Go little further

ColinJou commented 5 years ago

Hello everyone, I have just finished the tutorial about the training of the turtlebot3 in Gazebo, especially on the world 3 ''Mouving Obstacle''. Now I would like to know if it was possible to test the result of this training easily enough, maybe by creating a new world with new obstacle routes and being able to load the last .h5 obtained to test its abilities. I would like to point out that I don't know much about machine learning and the use of all these programs unfortunately. :/ Thank you in advance for your help!

kijongGil commented 5 years ago

Yes, If you have a the .h5 file, you can apply new world with new obstacle. Unfortunately, I can't explain to you about machine learning and the use of all these programs. If you want to create new obstacle, please refer to our gazebo environment.

Thanks, Gilbert.

ColinJou commented 5 years ago

Thank you for your answer! I have tried some improvements regarding the reward formula but none of them have been effective unfortunately (I am in the process of learning machine learning) Now I would like to try to add a 6th action, for example a "stop" that the robot could do instead of turning or going straight ahead. Unfortunately, I couldn't find the place of the code that manages this, except for the "action state" which only manages the graphic part. Is it possible to add a 6th action?

kijongGil commented 5 years ago

Hi, @ColinJou Yes, you can add a 6th action. First, modife action_size to 6. https://github.com/ROBOTIS-GIT/turtlebot3_machine_learning/blob/017741602c4356827eb09ca5caa2c84dc01d74fa/turtlebot3_dqn/nodes/turtlebot3_dqn_stage_1#L152 If you understand reward formula in this code, you will know that the angular velocity and reward are determined by the value of the action. If action value is 0, vel_cmd.angular.z is 1.5. If action value is 4, vel_cmd.angular.z is -1.5. https://github.com/ROBOTIS-GIT/turtlebot3_machine_learning/blob/017741602c4356827eb09ca5caa2c84dc01d74fa/turtlebot3_dqn/src/turtlebot3_dqn/environment_stage_1.py#L122 If you want to 'stop' action, you have to add vel_cmd.linear.x = 0 when action value is stop action. Also, you have to add 'stop' reward. https://github.com/ROBOTIS-GIT/turtlebot3_machine_learning/blob/017741602c4356827eb09ca5caa2c84dc01d74fa/turtlebot3_dqn/src/turtlebot3_dqn/environment_stage_1.py#L92

ColinJou commented 5 years ago

Thank you very much for your help! I finally managed to compile the addition of a 6th action (the results are not yet up to my expectations ^^). I still had two more questions due to my lack of knowledge:

In the exploitation of the two graphs created, I understand that the first one represents well the value of the reward obtained in the "I'th" generation. On the other hand, what does the second graph technically correspond to?
I am starting to familiarize myself with the differences between Q-learning and DQN, there is no creation of a state/action matrix but a neural network. I can see quite clearly the number of actions (5 in this case basic) but on the other hand as far as the states are concerned, it is all the pixel-per-pixel locations on the board, that is 41^2? Or is it the famous results of the lidar values combined with the angle and distance to the target? Sorry again for all these questions but I find the subject fascinating and I would like to clarify these little points!

kijongGil commented 5 years ago

The second graph is Q-value. If you want to know the meaning of Q-value, I recommend to read paper or search google. Simply put, the higher Q-value, the good result.
This state is just custom state. You known as, In Atari game, state is image. I created this code by referring to https://github.com/floodsung/DQN-Atari-Tensorflow

JaehyunShim commented 4 years ago

There hasn't been any reply from the questioner so I close this issue

ROBOTIS-GIT / turtlebot3_machine_learning

Question : Go little further #20