Solutions of MDPs - Githubissues

Farama-Foundation / Minigrid

Simple and easily configurable grid world environments for reinforcement learning

https://minigrid.farama.org/

Other

2.13k stars 611 forks source link

Solutions of MDPs #174

Closed simonsays1980 closed 2 years ago

simonsays1980 commented 2 years ago

Hi,

and thanks for providing this environment to the community! I am planning to use MiniGrid for some research in regard to generalization. For this I would like to compute the deviation of learned policies from optimal one and learned Q-values from optimal ones etc.

Is there any chance to do this with the information contained in the environment?

I mean different to the ForzenLake environment the MiniGrid environments do not possess the transition probabilities.

Do you have any idea?

maximecb commented 2 years ago

You could compute an approximation to the optimal policy by training several agents for a long time on environment A, and then train an agent on environment B, and ask it to generalize to environment A. Then you could look at the difference between the best policy trained specifically for A vs the policy you trained on B and transferred to A.

simonsays1980 commented 2 years ago

@maximecb Thanks for the answer! That is a nice way to circumvent the theoretical solution finding. How complicated do you think is constructing the MDP (with this than value iteration, etc. is possible) is it for the MiniGrid environments? Especially when they are initialized randomly.

maximecb commented 2 years ago

Honestly not sure. It's going to be a lot faster for smaller environments. You can also fix the starting configuration by calling env.seed(your_seed_value) each time you do env.reset() if you want.

simonsays1980 commented 2 years ago

Thanks fo the insights @maximecb !

Regarding the randomness. Is NonDeterministic a hyperparameter (to infuse randomness) or solely a description of the environment? I guess I need to make a deeper dive into the functionality of the environments.

There is something I recognized when training some agents on EmptyGrid-8x8-v0: it appears from the render videos that the agent sometimes stands still as if there is also a wait or do-nothing action. Is this the case?

openaigym video 3 460738 video000260

maximecb commented 2 years ago

The environment is deterministic but the position of the agent and configuration object can vary randomly depending on the seed for each episode.

There are actions like toggle and drop which don't apply to all environments: https://github.com/maximecb/gym-minigrid/blob/master/gym_minigrid/minigrid.py#L638