Closed johannes-fischer closed 1 year ago
The original reason for including the observation in the env state is: If the observation is not included, someone can call observe(env)
multiple times and get different results, and by calling observe many times, could get a better estimate of the state than is actually "allowed" by the POMDP.
I suppose that perhaps this is more confusing than beneficial though.
Does hearing the original reason change your opinion at all @johannes-fischer ?
The best path forward would probably be to make it an option.
I get the reason for including the observation in the POMDPCommonRLEnv
struct, which I left unchanged. This allows to return the same observation for subsequent calls to observe(env)
. However, this does not require to also return the observation as part of state(env)
does it?
With this PR, subsequent calls to observe
will still return the same observation. Getting different observations for the same env state can only happen with
observe(env)
state = state(env)
setstate!(state) # calls `initialobs`
observe(env)
I don't think this will be called accidentally, so I would not worry about this potential state information leak. The same information leak is accessible in the current implementation by using o = rand(initialobs(env.m, state(env)[1])
.
Sorry for the delay - the semester started up last week.
I see what you're saying now and you're definitely right. Thanks for the fix! The only thing I changed is that you no longer need CommonRLInterface.@provide
. will merge once the tests pass!
No worries, sounds good!
I find it inconsistent that
state(::MDPCommonRLEnv)
returns the state (converted to typeRLO
) butstate(::POMDPCommonRLEnv)
returns a state-observation tuple, without converting anything. Since the observation is not really part of the state, I think it makes sense to not provide it as part ofstate(::POMDPCommonRLEnv)
. Additionally, the state should also be optionally converted to another type, just like the observation.This PR implements the proposed changes.