DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.74k stars 1.66k forks source link

transferrable models #70

Closed blurLake closed 3 years ago

blurLake commented 4 years ago

Hi, is there a way to save a model only with the internal layers of the NN? Or is it possible to modify a trained model (the zip file), to let it retrain with different action dimension?

I am thinking the following example, say we have env_1 with action dimension 10, and env_2, which is more complicated version of env_1, with action dimension 20. One model is trained with env_1. Can we modify the model (zip file) and use it as initial value to train on env_2 afterwards?

Thank you very much!

Miffyli commented 4 years ago

You can do this with original stable-baselines using get_parameters and load_parameters, and with bit of manual tinkering. You need manually create the mismatching parameter arrays for env_2 agent and update correct parameters with ones from env_1 agent. E.g. if only the last fully-connected layer changes, you need to manually crate final_params = np.zeros((N, 20)) and then assign the weights of original parameters to it final_params[:, :10] = original_params. If you want to modify save .zip files, the format is specified here.

Similar support is planned / partially working in SB3, but still needs to go through a check and review.

bektaskemal commented 4 years ago

So, is exporting saved model as Pytorch model not supported yet in SB3? Is there a way to get model parameters as in stable-baselines' get_parameters function?

araffin commented 4 years ago

So, is exporting saved model as Pytorch model not supported yet in SB3?

It is for policy. (but not properly documented yet). There are nn.Module anyway.

bektaskemal commented 4 years ago

Thanks for reply. I have a related question. I was considering exporting actor network only but I noticed that model.predict and model.actor.predict returns different values. Is it expected behavior?

model = SAC('MlpPolicy', 'CartPole-v1', verbose=1) env = gym.make ('CartPole-v1') obs = env.reset() print(model.predict(obs)) print(model.actor.predict(obs))

Returns: (array([-0.20559013], dtype=float32), None) (array([-0.9510132], dtype=float32), None)

Edit: My mistake. It works as expected when both is called as deterministic.

araffin commented 3 years ago

Similar support is planned / partially working in SB3, but still needs to go through a check and review.

done in #138 (will merge today)