Closed hkuribayashi closed 3 years ago
Hey. Please fill in all the issue template, i.e. provide a minimal code to reproduce the bug.
Sounds like the issue is related to a custom env and unexpected training results, which is more of a tech support thing. For those I recommend checking the links in beginning of issue template.
i highly suspect your issue is the one mentioned in our tips and tricks: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment
please fill the custom env issue template next time and use the env checker...
Hey. Please fill in all the issue template, i.e. provide a minimal code to reproduce the bug.
Sounds like the issue is related to a custom env and unexpected training results, which is more of a tech support thing. For those I recommend checking the links in beginning of issue template.
Sorry @Miffyli. Should I close this issue? I mean, I was not my intention to cause such trouble.
i highly suspect your issue is the one mentioned in our tips and tricks: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#tips-and-tricks-when-creating-a-custom-environment
please fill the custom env issue template next time and use the env checker...
@araffin Thank you very much once more. Yor're a life saver. However, may I ask a complementary question? By considering the tip, even for discrete observation space states (using stablelines3 DQN), Should I normalize the observation state space? If yes, something link [0,1] or [-1,1].
By considering the tip, even for discrete observation space states (using stablelines3 DQN), Should I normalize the observation state space? If yes, something link [0,1] or [-1,1].
I think you may be confusing action and observation space. But yes, for observation spaces, as mentioned in the doc, it is always a good practice to normalize it ([-1, 1], [0, 1] should not really matter).
Should I close this issue?
if your issue is solved, yes.
Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
Hi everyone,
When I use the following space actions configurations, i sample vector action with different values. But I created a custom gym Environment with the same action space configuration. I set up the baselines3 by using A2C or PPO, but for all sampled actions it seems the agent got stucked always in the action space lower bound (5.0). I was expecting something like:
21.08086 20.020802 16.812733 23.77745 10.687413 20.424904 15.4278145 26.068079 18.092493 22.096527 ] [ 5.002933 8.210208 15.631343 5.3958955 29.201706 27.193197 21.82524 25.94392 33.925514 30.831163 ]
What am I doing wrong?
`import numpy as np from gym import spaces
low_actions = [] highactions = [] for in range(10): low_actions.append(5.0) high_actions.append(35.0)
action_space = spaces.Box(low=np.array(low_actions), high=np.array(highactions)) # steer, gas, brake for in range(1000): print(action_space.sample())`
Additional
context
Add any other context about the question here.
Checklist