Some questions related to highway_planning ipynb script

sparshgarg23 commented 4 years ago

Hi,I was going through the highway planning script in which we use Deterministic Planning.I am new to reinforcement learning,so I know that how we set gamma,will influence the agent's decision to choose a reward.On setting gamma to 1.0 I recieved a division by zero error.Any ideas as to why this is happening. Secondly when I change the environment to Intersection-v0 and the agent to DQN,I recieved an error of the following format: Trying to step environment which is currently done. While the monitor is active for intersection-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode. Any ideas as to what could be the cause for this,and what can be done to work around this. Do you think we should add the line env.reset() after we get the rewards for one particular step . I am new to this field and I have a deep profound interest in autonomous driving decison making ,so apolgies if I made any mistakes.

eleurent commented 4 years ago

how we set gamma,will influence the agent's decision to choose a reward

Increasing gamma means that long-term outcomes will matter more, and thus the planner will tend to explore more uniformly rather than focusing on near-optimal branches at early depths. Hence, it requires an increased sample budget.

On setting gamma to 1.0 I recieved a division by zero error.Any ideas as to why this is happening.

gamma is often chosen in [0, 1) such that the state value sum_t gamma^t r_t is always defined. In particular, the planner makes use of the maximum possible value sum_t gamma^t.1 = 1/(1-gamma) (rewards are also assumed to be bounded in [0, 1]).

Secondly when I change the environment to Intersection-v0 and the agent to DQN,I recieved an error of the following format: Trying to step environment which is currently done. While the monitor is active for intersection-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.

Yes, this for loop is missing a reset if the environment reaches a terminal state, because it was intended for tree-based planning. To train a DQN, you should rather use the Evaluation class of rl-agents which handles these resets automatically. I will probably upload a new script for Intersection+DQN.

eleurent commented 4 years ago

@sparshgarg23 I uploaded a new Intersection + DQN script: https://github.com/eleurent/highway-env/tree/master/scripts

sparshgarg23 commented 4 years ago

Thanks eleurent,I will look into it ASAP

Farama-Foundation / HighwayEnv

Some questions related to highway_planning ipynb script #94