Closed joelmichelson closed 1 year ago
Hello,
net_arch = [
{'activation_fn': th.nn.ReLU, 'pi': [32, 32, 32, 32], 'vf': [33, 32, 32, 32]}, #dummy values just to show up clearly in print statement
{'lstm': 55},
{'activation_fn': th.nn.ReLU, 'pi': [25], 'vf': [26]}
]
Where did you see that syntax?
The net_arch
argument must be a dictionary in SB3 v1.8.0 (we did that change to simplify code and remove inconsistent behavior).
To understand what you can change, best is to take a look at: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/aacded79c5e8357545fd94999c2a18cb8f285cb6/sb3_contrib/common/recurrent/policies.py#L83-L87
I apologize for the confusion. I'm not sure why I was using a list for net_arch.
❓ Question
I am attempting to use recurrent policies as follows:
Later passing this policy to the RecurrentPPO class initialization. However, on printing this policy, I get a summary which does use my CustomCNN implementation correctly, but does not appear to be using all of the net_arch:
So the first item in net_arch is correct, aside from activation_fn not being set. But the subsequent lines don't appear do do anything.
I'm not sure if I'm misunderstanding how the recurrent policy is set up and/or if I'm initializing it totally incorrectly. Are the LSTM layers in this policy summary being unused, as it appears, or is the real architecture different from this summary (do lstm layers just go between features and mlp always)? If not, how can I write a working net_arch?
Checklist