JuliaPOMDP / POMDPs.jl

MDPs and POMDPs in Julia - An interface for defining, solving, and simulating fully and partially observable Markov decision processes on discrete and continuous spaces.
http://juliapomdp.github.io/POMDPs.jl/latest/
Other
662 stars 100 forks source link

added new handling of terminal states in GenerativeBelief MDP #559

Closed zsunberg closed 1 month ago

zsunberg commented 2 months ago

Termination in GenerativeBeliefMDPs has been a pain point (see #555, #544). This PR changes the default behavior so that when a terminal state is sampled from a belief state, the new state is terminalstate.

zsunberg commented 2 months ago

@WhiffleFish , @himanshugupta1009 , @the-one-and-only-jackson , @FlyingWorkshop you have all dealt with this recently. What do you think?

A downside is that now the state type for GenerativeBeliefMDP is the union of the belief type and TerminalState.

One thing to note is that I added a pdf(b, s) == 0 check to isterminal:

isterminal(bmdp::GenerativeBeliefMDP, b) = all(s -> isterminal(bmdp.pomdp, s) || pdf(b, s) == 0.0, support(b))

so, perhaps that alone could fix most of the problems? Maybe the default behavior should be to just keep trying to update and throw a warning to try terminal_behavior = TerminalStateTerminalBehavior if there is an error.

the-one-and-only-jackson commented 2 months ago

A downside is that now the state type for GenerativeBeliefMDP is the union of the belief type and TerminalState.

I think a union type is necessary if there is a possibility of calling gen on a belief with terminal states in its support, so this seems like a reasonable solution.

In cases where this is not possible (i.e. a terminal state will cause a "terminal" observation, leading to a terminal belief) ContinueTerminalBehavior can be used to preserve type stability, and should be noted in the documentation.

One thing to note is that I added a pdf(b, s) == 0 check to isterminal

Is an identically zero check necessary, or should there be a tolerance check instead?

zsunberg commented 1 month ago

Ok, given @the-one-and-only-jackson 's concurrence, I think this is a good idea. I'll finish the docs and merge. If anyone wants me to stop, let me know soon.

Is an identically zero check necessary, or should there be a tolerance check instead?

I think identically zero is better.