DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.14k stars 1.7k forks source link

what is the proper way to train model with model loading #1247

Open muk465 opened 1 year ago

muk465 commented 1 year ago

❓ Question

I want to train my environment on multiple volumes for that i am using a for loop ,and changing the image in the environment

from stable_baselines3 import DDPG
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise

for i in range(5):
  label_name=label_files[i]

  env=NeuroRL5(label_name=label_name)
  n_actions = env.action_space.shape[-1]

  action_noise =  OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
  if is_empty_model(model_dir):
      model = DDPG('MlpPolicy',env,action_noise=action_noise, verbose=1)
  else:
      load_model=os.listdir(model_dir)[0]
      model_path=os.path.join(model_dir,load_model)
      model.set_parameters(model_path)

  model.learn(total_timesteps=100000,callback=[CsvCallback(),EpisodeCallback()])
  model.save(save_path)

what i want to ask is regarding this

if is_empty_model(model_dir):
      model = DDPG('MlpPolicy',env,action_noise=action_noise, verbose=1)
  else:
      load_model=os.listdir(model_dir)[0]
      model_path=os.path.join(model_dir,load_model)
      model.set_parameters(model_path)

  model.learn(total_timesteps=100000,callback=[CsvCallback(),EpisodeCallback()])
  model.save(save_path)

1) is_empty_model just checks wheter the directory already has a saved model file or not if not i just load the model. 2)i want the model trained on previous volumes to be saved and i load it using setparameters as given in the documentation 3)is this the correct way to train ?

system info My environment consists of a 3d numpy array which has obstacles and a target ,my plan is to make my agent which follows a action model to reach the target

how the library was installed : !pip install stable-baselines3 ({'OS': 'Linux-5.13.0-41-generic-x86_64-with-glibc2.29 #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022', 'Python': '3.8.10', 'Stable-Baselines3': '1.6.2', 'PyTorch': '1.13.1+cu117', 'GPU Enabled': 'False', 'Numpy': '1.21.2', 'Gym': '0.21.0'}, 'OS: Linux-5.13.0-41-generic-x86_64-with-glibc2.29 46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022\nPython: 3.8.10\nStable-Baselines3: 1.6.2\nPyTorch: 1.13.1+cu117\nGPU Enabled: False\nNumpy: 1.21.2\nGym: 0.21.0\n')

Checklist

araffin commented 1 year ago

i want the model trained on previous volumes to be saved and i load it using setparameters as given in the documentation

why are you not using DDPG.load() (as shown in the doc)

my environment on multiple volumes

why are you not using a VecEnv for that? so you can train on the 5 envs at the same time.

muk465 commented 1 year ago

1) setparameters loads the parameters instantly as written in the doc, whereas load will load them from scratch,I don't know which one is better but it's given in the docimentation, kindly confirm. 2) because ddpg with vectorised env is not allowing episodic learning when I vectorise the noise as in issue #1230

muk465 commented 1 year ago

Also as you can see I want to update the model parameters instantly when the volume changes that's why I used set parameters.

araffin commented 1 year ago

In your case, the env observation/action space size stay the same, no? so you should probably be using set_env where no loading is needed for the agent (unless you want to train 5 different agents).

muk465 commented 1 year ago

1)action space and observation space size remains same. 2)i just change the label name which changes the image inside the env hence changing the env.

muk465 commented 1 year ago

3)i have not used set_env,

for i in range(20):
  label_name=label_files[i]

env=NeuroRL5(label_name=label_name)
  img_path=os.path.join(img_folder,label_name)
  img_arr=nib.load(img_path).get_fdata()
  n_actions = env.action_space.shape[-1]
  action_noise =  OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
  model = DDPG('MlpPolicy',env,action_noise=action_noise, verbose=1)

  if not(is_empty_model(model_dir)):
      load_model=os.listdir(model_dir)[0]
      model_path=os.path.join(model_dir,load_model)
      model.set_parameters(model_path)

  model.learn(total_timesteps=100000,callback=[CsvCallback(),EpisodeCallback(),ResetCallback()])

  model.save(save_path)
muk465 commented 1 year ago

my logic is i create a model with model = DDPG('MlpPolicy',env,action_noise=action_noise, verbose=1) then it checks if saved model is there or not ,if there is then it loads model with set parameters and then i learn the model ,this model is saved again and then loop is repeated.

muk465 commented 1 year ago

@araffin @qgallouedec kindly comment on this issue, 1)should is use set_env to change env inside the model as action and obs space remain same and then i would not require to save and load the env again and again,right? 2)but suppose i want to train each env for 1000 steps then i would have to give 10 times the number of steps in model.learn ? eg 10000 3)if use set_env how do i ensure that each env will be trained equally? 4)and in the above code how do i incorporate set_env like where do i put it after removing model.set_parameters?