[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

MIT License

8.65k stars 1.65k forks source link

[Question] Can't solve Gymnasium Frozenlake-v1 8x8 with A2C #1670

Open MetallicaSPA opened 11 months ago

MetallicaSPA commented 11 months ago

❓ Question

Hello, I'm trying to solve the Frozenlake-v1 environment with is_slippery = True (non-deterministic) with the stable baselines 3 A2C algorithm. I can solve the 4x4 version but I can't achieve any results with the 8x8 version. I also checked the RL-Zoo to see if there is any hyperparameter tunning about that environment but there is nothing. Which adjustments can I do to make it work properly?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 11 months ago

Hello,

Which adjustments can I do to make it work properly?

Have you tried other algorithms? Hyperparameter tuning? (included in the zoo, or you can have a look at https://araffin.github.io/post/hyperparam-tuning/)

MetallicaSPA commented 11 months ago

Hello,

Which adjustments can I do to make it work properly?

Have you tried other algorithms? Hyperparameter tuning? (included in the zoo, or you can have a look at https://araffin.github.io/post/hyperparam-tuning/)

I tried with a DQN without any luck. I tried modifying the size of the net (policy and value) and entropy and value coefficient for the A2C algorithm. Someone in this post mentioned that a tabular Q-Learning method would be more efficient than a DQN and a A2C. I'll check the hyperparameter tuning anyway but if anyone can point me to the right direction would be great. Thanks in advance.

araffin commented 11 months ago

By the way, what do you mean exactly by solving? a reward always equal to 1?

MetallicaSPA commented 11 months ago

By the way, what do you mean exactly by solving? a reward always equal to 1?

Solving the environment equals to reaching the finish state. By the way, I implemented the tabular Q-Learning and it can solve Frozenlake in the symbolic version I implemented (with extra rewards each step; take it as a Frozenlake with reward shaping). I still have no clue why a simpler algorithm is able to perform better than A2C which it's supposed to be a better one.

araffin commented 11 months ago

Solving the environment equals to reaching the finish state.

yes, but always or at least in some cases? Also the env is supposed to be deterministic, I've observed stochastic behavior...

araffin commented 11 months ago

I still have no clue why a simpler algorithm is able to perform better than A2C which it's supposed to be a better one.

simpler doesn't mean worse, tabular q-learning is tailored for that env.

MetallicaSPA commented 11 months ago

Solving the environment equals to reaching the finish state.

yes, but always or at least in some cases? Also the env is supposed to be deterministic, I've observed stochastic behavior...

I'm using the non deterministic version of the env (is_slippery=True), and it can solve it around 60 times out 100 aprox. With the regular Q-Learning, none. Same with A2C.

araffin commented 11 months ago

With those commands, I managed to get ~60% success.

a2c.yaml:

FrozenLake-v1:
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  n_envs: 8

CUDA_VISIBLE_DEVICES= OMP_NUM_THREADS=1 python3 -m rl_zoo3.train --algo a2c --env FrozenLake-v1 --verbose 1 -c a2c.yaml --n-eval-envs 5 --eval-episodes 10 -P -param gamma:0.999 ent_coef:0.01 --env-kwargs map_name:"'8x8'" is_slippery:True --log-interval 1000

MetallicaSPA commented 11 months ago

With those commands, I managed to get ~60% success.

a2c.yaml:

FrozenLake-v1:
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  n_envs: 8

CUDA_VISIBLE_DEVICES= OMP_NUM_THREADS=1 python3 -m rl_zoo3.train --algo a2c --env FrozenLake-v1 --verbose 1 -c a2c.yaml --n-eval-envs 5 --eval-episodes 10 -P -param gamma:0.999 ent_coef:0.01 --env-kwargs map_name:"'8x8'" is_slippery:True --log-interval 1000

Thank you for your reply! I'll try it to see if I can replicate these results. Anyway I think this should be added to the RL zoo repo