Closed huang6668 closed 1 year ago
I think that is a tricky question to answer, as the behaviour of the agent results from the interplay of many parts. For the agent to be less conservative, it needs to estimate that going forward has a high value. For that, it needs to a) have examples of such trajectories and b) have a model that correctly predicts the future rewards for these trajectories. Maybe the change of observation makes b) slower to converge, which means the behaviour collapses too early and positive examples are missing from a) or something?
But you're right, changing the rewards tradeoffs can definitely help if collisions are too drastically penalised currently. So I'd definitely advise to try that. And maybe play with the exploration/model params to improve a)/b) if you think one is limiting.
Okay, thank you for your answer. I will try to follow your advice and continue. I have another question. In occupancygrid, does grid_step refer to the size of the grid? If so, why doesn’t a CAV occupy two grids when I set grid_step to "grid_step": [2, 2]?As far as I know, the size of a CAV should be 5 meters * 2 meters
In occupancygrid, does grid_step refer to the size of the grid?
grid_step refers to the size of a grid cell, in meters, so [2,2] = 2m x 2m.
If so, why doesn’t a CAV occupy two grids when I set grid_step to "grid_step": [2, 2]?As far as I know, the size of a CAV should be 5 meters * 2 meters
You are correct, but the reason is that for now only the center point of the vehicle is added to the grid, not the whole rectangle. If you think this is not sufficient for accurate decision making (and you may be right), this code could be adapted to include the cells enclosed by the 4 corners, like what I did for the LidarObservation.
Got it, thank you for the response.
Hello, I would like to know how you used the occupancy rate grid with CNNPolicy. Did you redefine it yourself? I used the default occupancy rate grid configuration and trained with SB3, but there was an error of the convolution kernel being too large. However, using MLP would not work.
I used tianshouj for training. SB3 should be similar, replacing MLP with CNN should suffice. “but there was an error of the convolution kernel being too large.”Maybe you need to modify the parameters of the CNN network.
I used tianshouj for training. SB3 should be similar, replacing MLP with CNN should suffice. “but there was an error of the convolution kernel being too large.”Maybe you need to modify the parameters of the CNN network.
Thank you for your reply. I will continue to make some attempts. Meanwhile, while observing using grayscale images and CNN networks, I also encountered a similar problem as you. Changing the reward function seems to be helpful (the vehicle will continue to move forward, but it will still stop at intersections; and I believe the penalty for collisions has been set small enough). Have you solved this problem and what areas do I need to improve on.
I had a problem with the network parameters at that time, and after resetting the network parameters, the problem did not occur. By the way, you can also try different seeds. I have encountered a situation where training does not converge when using a certain random number seed, but I have not encountered this situation after replacing other seeds
I had a problem with the network parameters at that time, and after resetting the network parameters, the problem did not occur. By the way, you can also try different seeds. I have encountered a situation where training does not converge when using a certain random number seed, but I have not encountered this situation after replacing other seeds
Thanks! I will try it!:)
I had a problem with the network parameters at that time, and after resetting the network parameters, the problem did not occur. By the way, you can also try different seeds. I have encountered a situation where training does not converge when using a certain random number seed, but I have not encountered this situation after replacing other seeds
I still seem to be encountering problems as the vehicle is unable to learn useful strategies through the CNN network. Could you please share the corresponding settings? My email is 1504190470@qq.com Thank you very much!
When I train using occupancy grids and CNNs, after training for 1000 episodes, the agent starts to stop moving until the environment is truncated. At the same time, the reward is always 0. This situation does not occur when using Kinematics. I don’t know what caused this, could it be related to my reward settings? Is it because my collision reward setting is too high?