DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.34k stars 1.6k forks source link

[Bug]: DDPG cannot stop during training #1711

Closed AvalonGuo closed 8 months ago

AvalonGuo commented 9 months ago

🐛 Bug

when tranning panda_gym's PickAndPlace env using DDPG,i found the train couldn't stop using the default setting.

To Reproduce


### Relevant log output / Error message

```shell
no output

System Info

Checklist

AvalonGuo commented 9 months ago

model = DDPG( "MultiInputPolicy", env, learning_rate=0.001, buffer_size=100000, replay_buffer_class=HerReplayBuffer, tensorboard_log=log_dir, tau=0.05, gamma=0.95, verbose=1 )

model.learn( total_timesteps=1.6e6, progress_bar=True)

model.save("ddpg_franka")

araffin commented 9 months ago

i found the train couldn't stop using the default setting

What do you mean exactly by that?

AvalonGuo commented 9 months ago

1696898680601 Sry, I didn't express myself clearly enough.The situation is shown in the above figure,when using DDPG to train.The code is same except for the total_timesteps.

araffin commented 8 months ago

Using the following code and latest version of gymnasium, SB3, panda gym, I cannot reproduce the issue:

import panda_gym # noqa: F401
from stable_baselines3 import DDPG, HerReplayBuffer

model = DDPG(
    "MultiInputPolicy",
    "PandaPickAndPlace-v3",
    learning_rate=0.001,
    buffer_size=1000,
    replay_buffer_class=HerReplayBuffer,
    tau=0.05,
    gamma=0.95,
    verbose=1,
    learning_starts=100,
    policy_kwargs=dict(net_arch=[64])
)

model.learn(total_timesteps=1000, progress_bar=True)

model.env.close()