Closed Ian-Sy-Zhang closed 7 months ago
env.observation_space.sample()
however all this produces is a possible observation within the bounds, not necessarily a valid observation for the environment. Therefore, it doesn't necessarily generate an observation that follows the trig identity function.Correct code
env = gym.make("Pendulum-v1")
obs, _ = env.reset()
assert np.isclose(obs[0]**2 + obs[1]**2, 1)
for _ in range(100):
action = env.action_space.sample()
obs, _, _, _, _ = env.step(env.action_space.sample())
assert np.isclose(obs[0]**2 + obs[1]**2, 1)
May I ask what rewards make the best convergence? Mine using A3C found it hard to surpass -200 (for episodes no more than 200 steps).
Pendulum is a difficult exploration problem such that you might need to explore the environment more
Question
From Document of Gymnasium we can know that: the 0th item in Observation Space is 'x = cos(theta)' the 1st item in Observatin Space is 'y = sin(angle)'
I didn't see anything in the document saying that 'theta' and 'angle' are two different things. If theta is the same thing with angle, then x^2 + y^2 should be equal to 1.
The result shows that in 100 samples, 78 are incorrect.
So the questions are:
sample
function?