Closed GoogleCodeExporter closed 8 years ago
This is somewhat true, it can be awkward to know if the episode ended or was
cut-off.
I just read carefully through the code here:
http://rl-glue.googlecode.com/svn/trunk/RL-Glue/RL_glue.c
And I think we are saved.
RL_start() gets the first state from the env, and then gets the first action
from the
agent. Num_steps<--1
Now there is a loop, that goes while !terminal and while num_steps<max_steps
(the
time out).
In the loop, we have RL_step, which calls env_step with the last action, and
then
calls agent_step with the new state.
IF things are terminal, agent_end is called and num_steps is NOT incremented,
because
the agent only took 1 step.
If things are not terminal, num_steps is incremented and we go back to the
while loop.
This calling sequence suggests that if you called RL_episode with a step limit
of 2,
then the options are:
1) Agent takes 1 action and reaches terminal state. NUM_STEPS=1. Episode
completed.
2) Agent takes 1 action and reaches non-terminal state. Agent chooses another
action, but that won't be used until the following call to RL_step.
NUM_STEPS=2 and
the episode terminates.
So, you see, calling RL_Episode(k) really asks if the agent can terminate the
episode
in k-1 actions. So, if RL_Episode returns, and RL_NUM_STEPS<k, then the agent
succeeded. If RL_NUM_STEPS==k, then the episode literally did not complete.
Not even
RL_Glue or the environment knows if the agent's last action would have
completed the
episode.
We should argue whether this is actually the desired behavior or not.
I guess I could argue that you should just
There have been some discussions about this and related issues with RL_episode,
like if you cut off an episode, if there should be a way to continue it...
like: run
the episode for a while, check in on things, and then continue.
Original comment by brian.ta...@gmail.com
on 29 Jan 2008 at 6:20
We should figure what we're going to do about this one.
Original comment by brian.ta...@gmail.com
on 2 Sep 2008 at 9:57
RL_episode will have a return value (int) to distinguish between cutoff and
terminal.
Original comment by bttan...@gmail.com
on 3 Sep 2008 at 10:38
Original comment by bttan...@gmail.com
on 4 Sep 2008 at 4:05
r788 should handle this. I'm a little unsure about some of the network stuff
though,
so the test suite needs to be really good. I also fixed it up in the codec.
Original comment by brian.ta...@gmail.com
on 7 Sep 2008 at 6:03
Original issue reported on code.google.com by
Csaba.Szepesvari
on 31 Oct 2007 at 1:48