DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

[Feature Request] Resume trained model with set_parameters without reset_num_timesteps #1877

Closed tanielsfranklin closed 3 months ago

tanielsfranklin commented 3 months ago

🚀 Feature

Resume trained model with set_parameters without reset_num_timesteps

Motivation

I think this is useful when training using rounds. This feature can keep the record on the tensorboard, even when changing some training hyperparameters

Pitch

Train some rounds and tensorboard logging remain continuous even with hyperparameters changing

Alternatives

When I use de the load() method works well, but don't allow changes in batch size and n_steps

Additional context

No response

Checklist

araffin commented 3 months ago

When I use de the load() method works well, but don't allow changes in batch size and n_steps

from stable_baselines3 import PPO

PPO("MlpPolicy", "CartPole-v1").save("ppo_cartpole")

model = PPO.load("ppo_cartpole", n_steps=64, batch_size=32)

?

tanielsfranklin commented 3 months ago

When I use de the load() method works well, but don't allow changes in batch size and n_steps

from stable_baselines3 import PPO

PPO("MlpPolicy", "CartPole-v1").save("ppo_cartpole")

model = PPO.load("ppo_cartpole", n_steps=64, batch_size=32)

?

Load method does not accept these parameters. classmethod load(path, env=None, device='auto', custom_objects=None, print_system_info=False, force_reset=True, **kwargs)

araffin commented 3 months ago

Load method does not accept these parameters.

yet the provided code works...

you should have a look at **kwargs and what that means in python.

tanielsfranklin commented 3 months ago

Load method does not accept these parameters.

yet the provided code works...

you should have a look at **kwargs and what that means in python.

Oh, it's my fault. I was thinking on this right now. It will be very useful, thanks.