JuliaPOMDP / POMDPs.jl

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.
http://juliapomdp.github.io/POMDPs.jl/latest/
Other
664 stars 100 forks source link

Remove latest observation from CommonRLEnv POMDP state #516

Closed johannes-fischer closed 1 year ago

johannes-fischer commented 1 year ago

I find it inconsistent that state(::MDPCommonRLEnv) returns the state (converted to type RLO) but state(::POMDPCommonRLEnv) returns a state-observation tuple, without converting anything. Since the observation is not really part of the state, I think it makes sense to not provide it as part of state(::POMDPCommonRLEnv). Additionally, the state should also be optionally converted to another type, just like the observation.

This PR implements the proposed changes.

zsunberg commented 1 year ago

The original reason for including the observation in the env state is: If the observation is not included, someone can call observe(env) multiple times and get different results, and by calling observe many times, could get a better estimate of the state than is actually "allowed" by the POMDP.

I suppose that perhaps this is more confusing than beneficial though.

Does hearing the original reason change your opinion at all @johannes-fischer ?

The best path forward would probably be to make it an option.

johannes-fischer commented 1 year ago

I get the reason for including the observation in the POMDPCommonRLEnv struct, which I left unchanged. This allows to return the same observation for subsequent calls to observe(env). However, this does not require to also return the observation as part of state(env) does it?

With this PR, subsequent calls to observe will still return the same observation. Getting different observations for the same env state can only happen with

observe(env)
state = state(env)
setstate!(state) # calls `initialobs`
observe(env)

I don't think this will be called accidentally, so I would not worry about this potential state information leak. The same information leak is accessible in the current implementation by using o = rand(initialobs(env.m, state(env)[1]).

zsunberg commented 1 year ago

Sorry for the delay - the semester started up last week.

I see what you're saying now and you're definitely right. Thanks for the fix! The only thing I changed is that you no longer need CommonRLInterface.@provide. will merge once the tests pass!

johannes-fischer commented 1 year ago

No worries, sounds good!