DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
2.07k stars 515 forks source link

[Question] state dimension different from original gym. #137

Closed hskAlena closed 3 years ago

hskAlena commented 3 years ago

In the gym FetchPush-v1, env.observation_space['observation'] outputs Box(-inf, inf, (25,), float32)

However, when I print out 'observation' space length using enjoy.py, it prints out 26. `python enjoy.py --algo tqc --env FetchPush-v1

print(len(obs['observation'][0]))`

What makes the difference? I'm using the agent's state output of enjoy.py, so the dimension should be the same. Is this a bug or do I need to post-process the output before using it somewhere else?

System Info Python version is 3.7.9 PyTorch version is 1.7.1

araffin commented 3 years ago

Hello, if you take a look at TQC hyperparameters, it uses a time feature wrapper (see https://github.com/araffin/rl-baselines-zoo/issues/79 and https://sb3-contrib.readthedocs.io/en/master/common/wrappers.html#timefeaturewrapper) which therefore adds an additional dimension to the input. The enjoy.py script loads automatically the required wrappers.

hskAlena commented 3 years ago

Hello! I didn't know there were similar questions previously, thanks for your kind reply! :) I guess I can just get rid of the last dimension of state space then, like obs['observation'][0][:-1]

Again, thanks a lot!