DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.72k stars 1.66k forks source link

[Question] agent got stucked always in the action space lower bound #389

Closed hkuribayashi closed 3 years ago

hkuribayashi commented 3 years ago

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

Hi everyone,

When I use the following space actions configurations, i sample vector action with different values. But I created a custom gym Environment with the same action space configuration. I set up the baselines3 by using A2C or PPO, but for all sampled actions it seems the agent got stucked always in the action space lower bound (5.0). I was expecting something like:

21.08086 20.020802 16.812733 23.77745 10.687413 20.424904 15.4278145 26.068079 18.092493 22.096527 ] [ 5.002933 8.210208 15.631343 5.3958955 29.201706 27.193197 21.82524 25.94392 33.925514 30.831163 ]

What am I doing wrong?

`import numpy as np from gym import spaces

low_actions = [] highactions = [] for in range(10): low_actions.append(5.0) high_actions.append(35.0)

action_space = spaces.Box(low=np.array(low_actions), high=np.array(highactions)) # steer, gas, brake for in range(1000): print(action_space.sample())`

Additional context

Add any other context about the question here.

Checklist

Miffyli commented 3 years ago

Hey. Please fill in all the issue template, i.e. provide a minimal code to reproduce the bug.

Sounds like the issue is related to a custom env and unexpected training results, which is more of a tech support thing. For those I recommend checking the links in beginning of issue template.

araffin commented 3 years ago

i highly suspect your issue is the one mentioned in our tips and tricks: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment

please fill the custom env issue template next time and use the env checker...

hkuribayashi commented 3 years ago

Hey. Please fill in all the issue template, i.e. provide a minimal code to reproduce the bug.

Sounds like the issue is related to a custom env and unexpected training results, which is more of a tech support thing. For those I recommend checking the links in beginning of issue template.

Sorry @Miffyli. Should I close this issue? I mean, I was not my intention to cause such trouble.

hkuribayashi commented 3 years ago

i highly suspect your issue is the one mentioned in our tips and tricks: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment

please fill the custom env issue template next time and use the env checker...

@araffin Thank you very much once more. Yor're a life saver. However, may I ask a complementary question? By considering the tip, even for discrete observation space states (using stablelines3 DQN), Should I normalize the observation state space? If yes, something link [0,1] or [-1,1].

araffin commented 3 years ago

By considering the tip, even for discrete observation space states (using stablelines3 DQN), Should I normalize the observation state space? If yes, something link [0,1] or [-1,1].

I think you may be confusing action and observation space. But yes, for observation spaces, as mentioned in the doc, it is always a good practice to normalize it ([-1, 1], [0, 1] should not really matter).

Should I close this issue?

if your issue is solved, yes.