Executed action information

skynox03 commented 3 years ago

Hello Eleurent,

I tried to save the executed actions during an episode for each step. However when I compare the video to the saved data, sometimes there is a saved action for a step, which is not executed in the video.

I tried saving the action using different sources like: info["action"], agent.act(observation), or calling it through the def step in evaluation.py (rl-agents)

info["action"] and calling it through the def step in evaluation.py - They both match in the output and correspond to the video as well. Just there is a deviation suddenly in between which does not match the video step.
agent.act(observation) - The output from this deviates a lot from the video.

Can you maybe figure out the reason, or suggest a better way to print the executed action during a step?

Another issue which I noticed was that, for example, the ego-vehicle is driving in the left most lane. The action which should be printed here is IDLE, since it is not changing any lane. But sometimes, according to the saved data, action it is getting is LEFT, even though there is no left lane to go to, and hence it keeps driving straight. So that is why I am a little confused, so it would be better if you could tell me the command to print the executed action, that is, the action that ego vehicle actually performs in the highway simulator.

Also env.step (which is used to save the step) corresponds to the simulation frequency or the policy frequency?

eleurent commented 3 years ago

Hi @skynox03

All these means of accessing the action are supposed to be the same:

an action is produced by agent.act(observation)
this action is fed to env.step(action) in the step method of evaluation.py
after having been executed, the action is written in the info dict returned by env.step

So, they are supposed to match. Do you have an example where they don't? (note that if the agent is deterministic, the action obtained by calling agent.act() another time may not be the same as the one that was actually obtained by agent.plan() in evaluation.py)

Regarding the issue of actions that are not available in some states (e.g. change to left lane when the vehicle is already on the leftmost lane): Openai Gym's interface does not really provide a way of disabling actions, so I just added a env.get_available_actions() getter that you can use to ensure that you always use valid actions. But since the gym interface still allows any agent to request an unavailable action, you can simply consider that the transitions dynamics for these actions are the same as those of IDLE in certain states. I do not think that they should be replaced with IDLE labels, however. The agent should be able to learn that in certain states, some actions are equivalent to doing nothing (just like moving towards a wall in a video game)

Does that make sense?

And env.step corresponds to the policy frequency. But if the simulation frequency is higher, several intermediate frames will be simulated between two policy steps (for smoother rendering / more accurate physical integration)

eleurent commented 3 years ago

First issue: cannot reproduce Second issue: won't fix

Please reopen if you still have problems.

Farama-Foundation / HighwayEnv

Executed action information #199