araffin / learning-to-drive-in-5-minutes

Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4
MIT License
287 stars 85 forks source link

[Question]: The pretrained agent for Level 1 always turning left #12

Open bobbyhaliwela opened 5 years ago

bobbyhaliwela commented 5 years ago

As the title say, the pretrained agent always turning left on my machine and ended the episode when the car touch the yellow stripes lines. I tried to train it from scratch with no success either. Once if finished training, the car is running in circle anti-clockwise. However, everything works great for level 0. My questions is, is this only happen for me on my machine, or did i miss changing something in the config? Did anyone else have the same experience?

araffin commented 5 years ago

Hello, Please fill in the issue template. Also, check that you are using the right VAE model.

bobbyhaliwela commented 5 years ago

I've done some test again. First, using pre-trained agent and pre-trained VAE for level 1. Second, training from scratch using pre-trained VAE for level 1. Before doing both, i followed the steps for reproducing results and edit the config.py:

MAX_STEERING_DIFF = 0.15
MAX_THROTTLE = 0.5 # MAX_THROTTLE = 0.6 can work but it's harder to train due to the sharpest turn
LEVEL = 1

Then i execute this command to test the pre-trained agent using pre-trained VAE:

python -m teleop.teleop_client --algo sac -vae logs/sac/vae-level-1-dim-64.pkl --exp-id 6

The agent is no longer doing circle driving, but it changed lane when i turned it into autonomous. It turned left, crossing the yellow stripes lines, before staying on the track until the first right turn. It gives a soft right turn before running straight off the track. Only test this twice, both test gives the same results. Then proceed to training from scratch by executing this command:

python train.py --algo sac -n 15000 -vae logs/vae-level-1-dim-64.pkl --teleop

I call the episode over everytime the agent are close to yellow stripes lines or the white lines. After a few seconds, the agent are able to drive by driving very close to the yellow stripes lines, as if it were lines following - the yellow stripes lines. But after a few mistakes, it just kept on turning left and eventually driving in circle. This is the output for when the agent starts driving in circle:

-----------------------------------------
| current_lr              | 0.0003      |
| ent_coef                | 0.61908376  |
| ent_coef_loss           | 0.004652093 |
| entropy                 | 1.8176327   |
| ep_rewmean              | nan         |
| episodes                | 50          |
| eplenmean               | nan         |
| fps                     | 10          |
| mean 100 episode reward | -482        |
| n_updates               | 27600       |
| policy_loss             | 177.12634   |
| qf1_loss                | 111.92655   |
| qf2_loss                | 109.09445   |
| time_elapsed            | 426.71      |
| total timesteps         | 4429        |
| value_loss              | 26.78433    |
-----------------------------------------
Episode finished. Reward: -109.32 33 Steps
SAC training duration: 1.33s
Waiting for teleop

I did not change anything besides the config.py. This is my OS and hardware information:

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:    18.04
Codename:   bionic
H/W path                 Device           Class          Description
====================================================================
                                          system         All Series (All)
/0                                        bus            MAXIMUS VII RANGER
/0/0                                      memory         64KiB BIOS
/0/40                                     memory         16GiB System Memory
/0/40/0                                   memory         DIMM [empty]
/0/40/1                                   memory         8GiB DIMM DDR3 Synchron
/0/40/2                                   memory         DIMM [empty]
/0/40/3                                   memory         8GiB DIMM DDR3 Synchron
/0/4d                                     processor      Intel(R) Core(TM) i7-47
/0/4d/4e                                  memory         256KiB L1 cache
/0/4d/4f                                  memory         1MiB L2 cache
/0/4d/50                                  memory         8MiB L3 cache
/0/100                                    bridge         4th Gen Core Processor 
/0/100/1                                  bridge         Xeon E3-1200 v3/4th Gen
/0/100/1/0                                display        GP104
/0/100/1/0.1                              multimedia     GP104 High Definition A
/0/100/2                                  display        Xeon E3-1200 v3/4th Gen
/0/100/3                                  multimedia     Xeon E3-1200 v3/4th Gen
....

I will try changing the ent-coef for the next test as from what i understand, the original sac paper suggest that this is the hyperparams that needs fine tune cmiiw. Do you have more suggestions?