Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving
https://highway-env.farama.org/
MIT License
2.65k stars 757 forks source link

Issues with training in the intersection.env environment #509

Closed huang6668 closed 1 year ago

huang6668 commented 1 year ago

When I train using occupancy grids and CNNs, after training for 1000 episodes, the agent starts to stop moving until the environment is truncated. At the same time, the reward is always 0. This situation does not occur when using Kinematics. I don’t know what caused this, could it be related to my reward settings? Is it because my collision reward setting is too high?

eleurent commented 1 year ago

I think that is a tricky question to answer, as the behaviour of the agent results from the interplay of many parts. For the agent to be less conservative, it needs to estimate that going forward has a high value. For that, it needs to a) have examples of such trajectories and b) have a model that correctly predicts the future rewards for these trajectories. Maybe the change of observation makes b) slower to converge, which means the behaviour collapses too early and positive examples are missing from a) or something?

But you're right, changing the rewards tradeoffs can definitely help if collisions are too drastically penalised currently. So I'd definitely advise to try that. And maybe play with the exploration/model params to improve a)/b) if you think one is limiting.

huang6668 commented 1 year ago

Okay, thank you for your answer. I will try to follow your advice and continue. I have another question. In occupancygrid, does grid_step refer to the size of the grid? If so, why doesn’t a CAV occupy two grids when I set grid_step to "grid_step": [2, 2]?As far as I know, the size of a CAV should be 5 meters * 2 meters

eleurent commented 1 year ago

In occupancygrid, does grid_step refer to the size of the grid?

grid_step refers to the size of a grid cell, in meters, so [2,2] = 2m x 2m.

If so, why doesn’t a CAV occupy two grids when I set grid_step to "grid_step": [2, 2]?As far as I know, the size of a CAV should be 5 meters * 2 meters

You are correct, but the reason is that for now only the center point of the vehicle is added to the grid, not the whole rectangle. If you think this is not sufficient for accurate decision making (and you may be right), this code could be adapted to include the cells enclosed by the 4 corners, like what I did for the LidarObservation.

huang6668 commented 1 year ago

Got it, thank you for the response.

yshichseu commented 11 months ago

Hello, I would like to know how you used the occupancy rate grid with CNNPolicy. Did you redefine it yourself? I used the default occupancy rate grid configuration and trained with SB3, but there was an error of the convolution kernel being too large. However, using MLP would not work.

huang6668 commented 10 months ago

I used tianshouj for training. SB3 should be similar, replacing MLP with CNN should suffice. “but there was an error of the convolution kernel being too large.”Maybe you need to modify the parameters of the CNN network.

yshichseu commented 10 months ago

I used tianshouj for training. SB3 should be similar, replacing MLP with CNN should suffice. “but there was an error of the convolution kernel being too large.”Maybe you need to modify the parameters of the CNN network.

Thank you for your reply. I will continue to make some attempts. Meanwhile, while observing using grayscale images and CNN networks, I also encountered a similar problem as you. Changing the reward function seems to be helpful (the vehicle will continue to move forward, but it will still stop at intersections; and I believe the penalty for collisions has been set small enough). Have you solved this problem and what areas do I need to improve on.

huang6668 commented 10 months ago

I had a problem with the network parameters at that time, and after resetting the network parameters, the problem did not occur. By the way, you can also try different seeds. I have encountered a situation where training does not converge when using a certain random number seed, but I have not encountered this situation after replacing other seeds

yshichseu commented 10 months ago

I had a problem with the network parameters at that time, and after resetting the network parameters, the problem did not occur. By the way, you can also try different seeds. I have encountered a situation where training does not converge when using a certain random number seed, but I have not encountered this situation after replacing other seeds

Thanks! I will try it!:)

yshichseu commented 10 months ago

I had a problem with the network parameters at that time, and after resetting the network parameters, the problem did not occur. By the way, you can also try different seeds. I have encountered a situation where training does not converge when using a certain random number seed, but I have not encountered this situation after replacing other seeds

I still seem to be encountering problems as the vehicle is unable to learn useful strategies through the CNN network. Could you please share the corresponding settings? My email is 1504190470@qq.com Thank you very much!