ShaneFlandermeyer / tdmpc2-jax

Jax/Flax Implementation of TD-MPC2
48 stars 7 forks source link

Sub-optimal performance on gymnasium 'Hopper-v4' #11

Closed bkkgbkjb closed 1 month ago

bkkgbkjb commented 1 month ago

Hi all

Very much thank to this awesome codebase

While using it, I unfortunately found the performance on gymnasium's Hopper-v4 (which is a notoriously challenging env though) is quite sub-optmial

The highest episodic return of Hopper-v4 is roughly 3600

But the episodic return of train envs is at best ~2k and would quickly degenerate, as shown in the following figure

image

Have any of you observed similar issues?

I'm using the latest commit of this code with all default configs and package versions are:

absl-py==2.1.0
antlr4-python3-runtime==4.9.3
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
chex==0.1.87
cloudpickle==3.1.0
contourpy==1.3.0
cycler==0.12.1
Cython==0.29.37
dbx-stopwatch==1.6
dm-control==1.0.8
dm-env==1.6
dm-tree==0.1.8
einops==0.8.0
etils==1.9.4
Farama-Notifications==0.0.4
fasteners==0.19
flax==0.10.0
fonttools==4.54.1
fsspec==2024.9.0
glfw==2.7.0
grpcio==1.67.0
gym==0.21.0
gymnasium==0.29.0
humanize==4.11.0
hydra-core==1.3.2
idna==3.10
imageio==2.36.0
importlib_resources==6.4.5
jax==0.4.34
jax-cuda12-pjrt==0.4.34
jax-cuda12-plugin==0.4.34
jaxlib==0.4.34
jaxtyping==0.2.34
kiwisolver==1.4.7
labmaze==1.0.6
lxml==5.3.0
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.1
matplotlib==3.9.2
mdurl==0.1.2
ml_dtypes==0.5.0
msgpack==1.1.0
mujoco==2.3.1
mujoco-py==2.1.2.14
nest-asyncio==1.6.0
numpy==1.26.4
omegaconf==2.3.0
opt_einsum==3.4.0
optax==0.2.3
orbax-checkpoint==0.7.0
packaging==24.1
pandas==1.5.3
pillow==11.0.0
protobuf==5.28.2
pycparser==2.22
pygame==2.6.1
Pygments==2.18.0
PyOpenGL==3.1.7
pyparsing==2.4.7
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
rich==13.9.2
scipy==1.14.1
seaborn==0.13.2
six==1.16.0
tensorboard==2.18.0
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
tensorstore==0.1.66
toolz==1.0.0
tqdm==4.66.5
typeguard==2.13.3
typing_extensions==4.12.2
urllib3==2.2.3
Werkzeug==3.0.4
zipp==3.20.2
ShaneFlandermeyer commented 1 month ago

For finite horizon envs like hopper and humanoid, you should set predict_continues to true in the world model config. The agent should perform well after that.