Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.14k stars 793 forks source link

[Proposal] On the meaning of "terminated" "truncated" "done" etc. I miss "causality break" flag #194

Closed jamartinh closed 9 months ago

jamartinh commented 1 year ago

Proposal

Arrive to a wide agreement on the meaning of "done", "terminated" and "truncated".

Motivation

With the new API, now we have two boolean variables (terminated, truncated) instead on one, which gives 4 possible situation. However in the Gymnasium docs it is said that reset should be called when terminated=True and also that reset should be called when truncated= true. So it seems that with respect to resetting the environment, there is no difference, at least in specification and intention.

It also seems that both conditions point to an event which is very significative:

The rupture of the causal chain in the temporal series of the MDP.

This rupture, signaled previously by "done" and now by both "terminated and "truncated" has been used widely in RL libraries to do several things, like processing the data in replay buffer, decide in the programed "collectors" some conditions etc.

However, there are situations where you need to assume a sequence is "complete", i.e, for n-step returns, total accumulated discounted return calculation etc. however, the causal chain does not end as in continuing taks but still you want to communicate that outside so that collectors can save stats and process replay buffer memory but reset should not be done.

Of course, we can add this into the info field, but we can then have "truncated" inside info as was done previously.

So basically, it would be useful to take advantage of the 4 possible meaning on the true,false of these two variables and have a consensus on this so that RL library developers take that into account.

¿What event is now signaling causal chain rupture exclusively without requiring a reset?

Pitch

Arrive to a wide agreement/consensus on the meaning of "done", "terminated" and "truncated". and take into account the causal chain rupture.

Alternatives

add a specific output variable to step indicating causal chain rupture.

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 1 year ago

Interesting question, Im not sure I totally understand what you mean Could you look at these page and explain what definition we are missing

Kallinteris-Andreas commented 9 months ago

This issue has been inactive for over a year and the OP uses the term "causal chain rupture" which I can not understand what it means closing