[Bug]: DDPG cannot stop during training

AvalonGuo commented 9 months ago

🐛 Bug

when tranning panda_gym's PickAndPlace env using DDPG,i found the train couldn't stop using the default setting.

To Reproduce


### Relevant log output / Error message

```shell
no output

System Info

OS: Linux-5.15.0-83-generic-x86_64-with-debian-bullseye-sid # 92~20.04.1-Ubuntu SMP Mon Aug 21 14:00:49 UTC 2023
Python: 3.7.16
Stable-Baselines3: 2.0.0
PyTorch: 1.13.0+cu117
GPU Enabled: False
Numpy: 1.21.6
Cloudpickle: 1.6.0
Gymnasium: 0.28.1
OpenAI Gym: 0.26.2

Checklist

[X] My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] I have provided a minimal and working example to reproduce the bug
[X] I've used the markdown code blocks for both code and stack traces.

AvalonGuo commented 9 months ago

model = DDPG( "MultiInputPolicy", env, learning_rate=0.001, buffer_size=100000, replay_buffer_class=HerReplayBuffer, tensorboard_log=log_dir, tau=0.05, gamma=0.95, verbose=1 )

model.learn( total_timesteps=1.6e6, progress_bar=True)

model.save("ddpg_franka")

araffin commented 9 months ago

i found the train couldn't stop using the default setting

What do you mean exactly by that?

AvalonGuo commented 9 months ago

1696898680601 Sry, I didn't express myself clearly enough.The situation is shown in the above figure,when using DDPG to train.The code is same except for the total_timesteps.

araffin commented 8 months ago

Using the following code and latest version of gymnasium, SB3, panda gym, I cannot reproduce the issue:

import panda_gym # noqa: F401
from stable_baselines3 import DDPG, HerReplayBuffer

model = DDPG(
    "MultiInputPolicy",
    "PandaPickAndPlace-v3",
    learning_rate=0.001,
    buffer_size=1000,
    replay_buffer_class=HerReplayBuffer,
    tau=0.05,
    gamma=0.95,
    verbose=1,
    learning_starts=100,
    policy_kwargs=dict(net_arch=[64])
)

model.learn(total_timesteps=1000, progress_bar=True)

model.env.close()

DLR-RM / stable-baselines3