Closed bkkgbkjb closed 1 month ago
Hi all
Very much thank to this awesome codebase
While using it, I unfortunately found the performance on gymnasium's Hopper-v4 (which is a notoriously challenging env though) is quite sub-optmial
Hopper-v4
The highest episodic return of Hopper-v4 is roughly 3600
But the episodic return of train envs is at best ~2k and would quickly degenerate, as shown in the following figure
Have any of you observed similar issues?
I'm using the latest commit of this code with all default configs and package versions are:
absl-py==2.1.0 antlr4-python3-runtime==4.9.3 certifi==2024.8.30 cffi==1.17.1 charset-normalizer==3.4.0 chex==0.1.87 cloudpickle==3.1.0 contourpy==1.3.0 cycler==0.12.1 Cython==0.29.37 dbx-stopwatch==1.6 dm-control==1.0.8 dm-env==1.6 dm-tree==0.1.8 einops==0.8.0 etils==1.9.4 Farama-Notifications==0.0.4 fasteners==0.19 flax==0.10.0 fonttools==4.54.1 fsspec==2024.9.0 glfw==2.7.0 grpcio==1.67.0 gym==0.21.0 gymnasium==0.29.0 humanize==4.11.0 hydra-core==1.3.2 idna==3.10 imageio==2.36.0 importlib_resources==6.4.5 jax==0.4.34 jax-cuda12-pjrt==0.4.34 jax-cuda12-plugin==0.4.34 jaxlib==0.4.34 jaxtyping==0.2.34 kiwisolver==1.4.7 labmaze==1.0.6 lxml==5.3.0 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==3.0.1 matplotlib==3.9.2 mdurl==0.1.2 ml_dtypes==0.5.0 msgpack==1.1.0 mujoco==2.3.1 mujoco-py==2.1.2.14 nest-asyncio==1.6.0 numpy==1.26.4 omegaconf==2.3.0 opt_einsum==3.4.0 optax==0.2.3 orbax-checkpoint==0.7.0 packaging==24.1 pandas==1.5.3 pillow==11.0.0 protobuf==5.28.2 pycparser==2.22 pygame==2.6.1 Pygments==2.18.0 PyOpenGL==3.1.7 pyparsing==2.4.7 python-dateutil==2.9.0.post0 pytz==2024.2 PyYAML==6.0.2 requests==2.32.3 rich==13.9.2 scipy==1.14.1 seaborn==0.13.2 six==1.16.0 tensorboard==2.18.0 tensorboard-data-server==0.7.2 tensorboardX==2.6.2.2 tensorstore==0.1.66 toolz==1.0.0 tqdm==4.66.5 typeguard==2.13.3 typing_extensions==4.12.2 urllib3==2.2.3 Werkzeug==3.0.4 zipp==3.20.2
For finite horizon envs like hopper and humanoid, you should set predict_continues to true in the world model config. The agent should perform well after that.
Hi all
Very much thank to this awesome codebase
While using it, I unfortunately found the performance on gymnasium's
Hopper-v4
(which is a notoriously challenging env though) is quite sub-optmialThe highest episodic return of Hopper-v4 is roughly 3600
But the episodic return of train envs is at best ~2k and would quickly degenerate, as shown in the following figure
Have any of you observed similar issues?
I'm using the latest commit of this code with all default configs and package versions are: