Sub-optimal performance on gymnasium 'Hopper-v4'

Hi all

Very much thank to this awesome codebase

While using it, I unfortunately found the performance on gymnasium's Hopper-v4 (which is a notoriously challenging env though) is quite sub-optmial

The highest episodic return of Hopper-v4 is roughly 3600

But the episodic return of train envs is at best ~2k and would quickly degenerate, as shown in the following figure

Have any of you observed similar issues?

I'm using the latest commit of this code with all default configs and package versions are:

absl-py==2.1.0
antlr4-python3-runtime==4.9.3
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
chex==0.1.87
cloudpickle==3.1.0
contourpy==1.3.0
cycler==0.12.1
Cython==0.29.37
dbx-stopwatch==1.6
dm-control==1.0.8
dm-env==1.6
dm-tree==0.1.8
einops==0.8.0
etils==1.9.4
Farama-Notifications==0.0.4
fasteners==0.19
flax==0.10.0
fonttools==4.54.1
fsspec==2024.9.0
glfw==2.7.0
grpcio==1.67.0
gym==0.21.0
gymnasium==0.29.0
humanize==4.11.0
hydra-core==1.3.2
idna==3.10
imageio==2.36.0
importlib_resources==6.4.5
jax==0.4.34
jax-cuda12-pjrt==0.4.34
jax-cuda12-plugin==0.4.34
jaxlib==0.4.34
jaxtyping==0.2.34
kiwisolver==1.4.7
labmaze==1.0.6
lxml==5.3.0
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==3.0.1
matplotlib==3.9.2
mdurl==0.1.2
ml_dtypes==0.5.0
msgpack==1.1.0
mujoco==2.3.1
mujoco-py==2.1.2.14
nest-asyncio==1.6.0
numpy==1.26.4
omegaconf==2.3.0
opt_einsum==3.4.0
optax==0.2.3
orbax-checkpoint==0.7.0
packaging==24.1
pandas==1.5.3
pillow==11.0.0
protobuf==5.28.2
pycparser==2.22
pygame==2.6.1
Pygments==2.18.0
PyOpenGL==3.1.7
pyparsing==2.4.7
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
rich==13.9.2
scipy==1.14.1
seaborn==0.13.2
six==1.16.0
tensorboard==2.18.0
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
tensorstore==0.1.66
toolz==1.0.0
tqdm==4.66.5
typeguard==2.13.3
typing_extensions==4.12.2
urllib3==2.2.3
Werkzeug==3.0.4
zipp==3.20.2

ShaneFlandermeyer / tdmpc2-jax

Sub-optimal performance on gymnasium 'Hopper-v4' #11