davidljung / rl-glue

Automatically exported from code.google.com/p/rl-glue
0 stars 1 forks source link

There should be a way to learn if an episode ended because of 'timeout' or just 'normally' #39

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
When calling RL_episode( s ) the episode is run for at most s steps.
When RL_episode returns I can query the total number of steps spent in the
'episode', but if the task is really episodic there is currently no way to
check if the episode was completed because the number of steps was
exhausted or because the agent reached a terminal state.
There should be a function called e.g. RL_episode_completed() to check
this. The idea is that the person writing the code to run
experiments/calculate statistics should be able to check if the last
episode was normally completed without any cooperation from the agents
(otherwise one could use the agent_end() function for checking this).

Original issue reported on code.google.com by Csaba.Szepesvari on 31 Oct 2007 at 1:48

GoogleCodeExporter commented 8 years ago
This is somewhat true, it can be awkward to know if the episode ended or was 
cut-off.  

I just read carefully through the code here:
http://rl-glue.googlecode.com/svn/trunk/RL-Glue/RL_glue.c

And I think we are saved.

RL_start() gets the first state from the env, and then gets the first action 
from the
agent. Num_steps<--1

Now there is a loop, that goes while !terminal and while num_steps<max_steps 
(the
time out).

In the loop, we have RL_step, which calls env_step with the last action, and 
then
calls agent_step with the new state.

IF things are terminal, agent_end is called and num_steps is NOT incremented, 
because
the agent only took 1 step.

If things are not terminal, num_steps is incremented and we go back to the 
while loop.

This calling sequence suggests that if you called RL_episode with a step limit 
of 2,
then the options are:
1)  Agent takes 1 action and reaches terminal state.  NUM_STEPS=1.  Episode 
completed.

2)  Agent takes 1 action and reaches non-terminal state.  Agent chooses another
action, but that won't be used until the following call to RL_step.  
NUM_STEPS=2 and
the episode terminates.

So, you see, calling RL_Episode(k) really asks if the agent can terminate the 
episode
in k-1 actions.  So, if RL_Episode returns, and RL_NUM_STEPS<k, then the agent
succeeded. If RL_NUM_STEPS==k, then the episode literally did not complete.  
Not even
RL_Glue or the environment knows if the agent's last action would have 
completed the
episode.

We should argue whether this is actually the desired behavior or not.

I guess I could argue that you should just 

  There have been some discussions about this and related issues with RL_episode,
like if you cut off an episode, if there should be a way to continue it... 
like: run
the episode for a while, check in on things, and then continue.

Original comment by brian.ta...@gmail.com on 29 Jan 2008 at 6:20

GoogleCodeExporter commented 8 years ago
We should figure what we're going to do about this one.

Original comment by brian.ta...@gmail.com on 2 Sep 2008 at 9:57

GoogleCodeExporter commented 8 years ago
RL_episode will have a return value (int) to distinguish between cutoff and 
terminal.

Original comment by bttan...@gmail.com on 3 Sep 2008 at 10:38

GoogleCodeExporter commented 8 years ago

Original comment by bttan...@gmail.com on 4 Sep 2008 at 4:05

GoogleCodeExporter commented 8 years ago
r788 should handle this.  I'm a little unsure about some of the network stuff 
though,
so the test suite needs to be really good.  I also fixed it up in the codec.

Original comment by brian.ta...@gmail.com on 7 Sep 2008 at 6:03