[Proposal] Add transitional probabilities to Taxi and Cliff Walking toy text environments

axb2035 commented 1 year ago

Proposal

Only Frozen Lake in the toy text grid world environments implements transitional probabilities.

Taxi is supposed to have it based on the previous documentation but has never been implemented, always returning 1.0.

At the same time cliff walking could also be set up to use transitional probabilities using the same approach.

Motivation

Adding transitional probabilities to taxi will close out a TODO that has been on the list for a long time. It will also bring the environment in line with the source paper, The Fickle Taxi Task - Section 7.1 of Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition (https://www.jair.org/index.php/jair/article/view/10266/24463).

For cliff walking, it presents and opportunity to add depth to the environment and since it uses the same approach would not add significantly more time or risk.

Pitch

Taxi Add transitional probability into taxi toy text environment:

leverage approach from frozen_lake to supply a transitional probability of 0.8 direction intended, 0.1 left and 0.1 right of intended direction for movement actions.
the paper proposes that, once the taxi has picked up the passenger and moved one square away from the passenger's source location, the passenger changes their destination location with probability 0.3.
for taxi transition probabilities for pick up and drop off actions remain 1.0.
add arguments to enable/disable features:
- is_rainy = True | False to enable transitional probabilities on taxi movement, defaults to False.
- fickle_passenger = True | False to enable the passenger to change their destination once picked up, defaults to False.

Cliff walking Add transitional probability into cliff walking toy text environment by leverage approach from frozen_lake to supply a transitional probability of 0.3 direction intended, 0.3 left and 0.3 right of intended direction for movement actions.

add arguments to enable/disable transitional probabilities.
- is_slippery = True | False to enable transitional probabilities on player movement, defaults to False.

For both:

Update unit tests.
Update documentation.
Increment versions in registry.

Alternatives

Do nothing. Misses an opportunity to make the toy_text environments consistent and more useful for beginner RL practitioners.
Remove transitional probability from taxi and/or cliff walking. In either case prob would be removed from the info returned. Removes the need to complete taxi work and will simplify any ongoing maintenance.

Additional context

No response

Checklist

[X] I have checked that there is no similar issue in the repo

axb2035 commented 1 year ago

I would like to do this one as it will help my understanding of the toy text environments.

riiswa commented 11 months ago

Hey @axb2035 , are you still on it ? I need these stochastic environments for my project, so I can open a PR for this issue (if you're not on it of course :)). (cc @pseudo-rnd-thoughts )

axb2035 commented 11 months ago

@riiswa thank for the note. At this stage I am not actviely working on it, so please progress. :)

CloseChoice commented 4 weeks ago

I see some members in #661 questioning whether it is really wanted from the maintainers. Could we get a statement here before someone puts the effort in? @RedTachyon @pseudo-rnd-thoughts

pseudo-rnd-thoughts commented 4 weeks ago

@CloseChoice I might have closed #661 by accident, apologies to @riiswa if that was the case.

If we can add what @axb2035 discussed in this issue as an optional feature that extends the current implementation then I think we should be able to add the features discussed. From what @axb2035 wrote above, #661 seemed to be going in the correct direction, it was missing the probability of the action taken (info["prob"]) and a couple of other sections. @CloseChoice or @riiswa if you wish to finish the work, go for it and I can review at the end

CloseChoice commented 4 weeks ago

@CloseChoice I might have closed #661 by accident, apologies to @riiswa if that was the case.

If we can add what @axb2035 discussed in this issue as an optional feature that extends the current implementation then I think we should be able to add the features discussed. From what @axb2035 wrote above, #661 seemed to be going in the correct direction, it was missing the probability of the action taken (info["prob"]) and a couple of other sections. @CloseChoice or @riiswa if you wish to finish the work, go for it and I can review at the end

Thanks for the clarification. Then I'll go for the Cliff walking environment and give @riiswa some more time to finish up the work on Taxi

Farama-Foundation / Gymnasium