Closed juliayun23 closed 3 years ago
Hi @nomatterhoe, Thank you for that feedback, that's interesting. I think that for this roundabout environment, I actually have never tried model-free algorithms so far, only model-based (planning) approaches. I'll try and see if I can reproduce the conservative behaviours that you get.
Hypothesis: the model has trouble figuring out whether or not a vehicle is coming, and settles for the safe option. This could be alleviated by a different choice of observation type and neural network architecture, see e.g.this work on the intersection environment.
That being said, I just noticed that the default observation for roundabout is the Time To Collision observation, which is not appropriate since it was designed for straight roads. I just changed it to a Kinematics observation, which is better suited (OccupancyGrid could work as well)
Thanks for your reply @eleurent , following your suggestion, I tried the Roundabout env with OccupancyGrid observation and Kinematics observation, the freeze problem was solved in both cases. But it seems like the training curve won't converge (following is the roundabout env with Kinematics observation trained using ppo2)
The training curves also can't converge in Intersection env when I trained the vehicle using dqn and ppo2, I tried adjusting learning rate and total_timesteps, didn't make differences. I had a look at your paper, it seems like you customized the DQN algorithm for the intersection env by adding attention layers in your network architecture. So my new question is, in your opinion, is it possible at all that a baseline algorithm can perform/learn well under these complex driving environments such as roundabout and intersection?
But it seems like the training curve won't converge
It seems to me that training does converge (i.e. to a local maximum) in about 100k steps? You may be troubled by these downward spikes: they probably correspond to accidents. However,
The training curves also can't converge in Intersection env when I trained the vehicle using dqn and ppo2, I tried adjusting learning rate and total_timesteps, didn't make differences. I had a look at your paper, it seems like you customized the DQN algorithm for the intersection env by adding attention layers in your network architecture.
First, note that the curves reported on the social attention paper show the reward averaged over many random seed, and not a single training run. This explains why we do not see these spikes, but they were still present in individual runs.
So my new question is, in your opinion, is it possible at all that a baseline algorithm can perform/learn well under these complex driving environments such as roundabout and intersection?
It depends on what you call perform/learn well. If you mean e.g. reaching a 0% collision rate while still being able to cross the intersection / roundabout (i.e. no freezing robot), then no, I haven't observed such successes for baseline algorithms and architectures :/ I've mostly tried:
but did not experiment much with policy gradients.
Perhaps surprisingly, it seems that even for such simple simulations, baseline algorithms / architectures are not sufficient.
Hi, @eleurent, recently I've been exploring your excellent project. I used PPO2 from stable-baselines to train vehicle in roundabout env, one weird phenomena I noticed was that the trained vehicle tends to freeze at the entrance of roundabout (not move anymore). This happened quite often when I evaluated the trained model. I'm not sure whether it's the problem of hyperparameters but I didn't make much changes to hyperparameters. Do you have any idea why this happens and suggestion on how to solve this?
My testing code was like this:
Thanks for your help.