Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
6.33k stars 730 forks source link

[Proposal] Add transitional probabilities to Taxi and Cliff Walking toy text environments #161

Open axb2035 opened 1 year ago

axb2035 commented 1 year ago

Proposal

Only Frozen Lake in the toy text grid world environments implements transitional probabilities.

Taxi is supposed to have it based on the previous documentation but has never been implemented, always returning 1.0.

At the same time cliff walking could also be set up to use transitional probabilities using the same approach.

Motivation

Adding transitional probabilities to taxi will close out a TODO that has been on the list for a long time. It will also bring the environment in line with the source paper, The Fickle Taxi Task - Section 7.1 of Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition (https://www.jair.org/index.php/jair/article/view/10266/24463).

For cliff walking, it presents and opportunity to add depth to the environment and since it uses the same approach would not add significantly more time or risk.

Pitch

Taxi Add transitional probability into taxi toy text environment:

Cliff walking Add transitional probability into cliff walking toy text environment by leverage approach from frozen_lake to supply a transitional probability of 0.3 direction intended, 0.3 left and 0.3 right of intended direction for movement actions.

For both:

Alternatives

  1. Do nothing. Misses an opportunity to make the toy_text environments consistent and more useful for beginner RL practitioners.

  2. Remove transitional probability from taxi and/or cliff walking. In either case prob would be removed from the info returned. Removes the need to complete taxi work and will simplify any ongoing maintenance.

Additional context

No response

Checklist

axb2035 commented 1 year ago

I would like to do this one as it will help my understanding of the toy text environments.

riiswa commented 11 months ago

Hey @axb2035 , are you still on it ? I need these stochastic environments for my project, so I can open a PR for this issue (if you're not on it of course :)). (cc @pseudo-rnd-thoughts )

axb2035 commented 11 months ago

@riiswa thank for the note. At this stage I am not actviely working on it, so please progress. :)

CloseChoice commented 4 weeks ago

I see some members in #661 questioning whether it is really wanted from the maintainers. Could we get a statement here before someone puts the effort in? @RedTachyon @pseudo-rnd-thoughts

pseudo-rnd-thoughts commented 4 weeks ago

@CloseChoice I might have closed #661 by accident, apologies to @riiswa if that was the case.

If we can add what @axb2035 discussed in this issue as an optional feature that extends the current implementation then I think we should be able to add the features discussed. From what @axb2035 wrote above, #661 seemed to be going in the correct direction, it was missing the probability of the action taken (info["prob"]) and a couple of other sections. @CloseChoice or @riiswa if you wish to finish the work, go for it and I can review at the end

CloseChoice commented 4 weeks ago

@CloseChoice I might have closed #661 by accident, apologies to @riiswa if that was the case.

If we can add what @axb2035 discussed in this issue as an optional feature that extends the current implementation then I think we should be able to add the features discussed. From what @axb2035 wrote above, #661 seemed to be going in the correct direction, it was missing the probability of the action taken (info["prob"]) and a couple of other sections. @CloseChoice or @riiswa if you wish to finish the work, go for it and I can review at the end

Thanks for the clarification. Then I'll go for the Cliff walking environment and give @riiswa some more time to finish up the work on Taxi