Closed rockingapple closed 3 years ago
Hello,
I found the guide do not explain how to add current observation and action in Custom Policy Network.
I'm not sure to get what you mean or what you want to do with obs and action... The code you are showing does not look like what is shown in the doc (and the link to the doc point to custom envs, not custom policy) and corresponds to the critc for DDPG (when using continuous actions).
You may check that issue: https://github.com/DLR-RM/stable-baselines3/issues/285 and check the documentation for off-policy custom network here: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#off-policy-algorithms
Hello,
I found the guide do not explain how to add current observation and action in Custom Policy Network.
I'm not sure to get what you mean or what you want to do with obs and action... The code you are showing does not look like what is shown in the doc (and the link to the doc point to custom envs, not custom policy) and corresponds to the critc for DDPG (when using continuous actions).
You may check that issue: #285 and check the documentation for off-policy custom network here: https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html#off-policy-algorithms
I am sorry to give the wrong link, the right link should be here which is lack of flexibility when customizing policy network. Yes, you are right, the code I showing above does not in the doc of stable baselines3. It is my paper code which comes from here. My paper code based on it and I make some modification.
I plan to migrate my code to stable baselines3, therefore, I need more flexible Custom Policy Network. For instance, in my paper, the observation is represented by the combination of X and Action,among them, X is a price tensor, Action is a weight vector. In my paper's DDPG algorithm, the input of Actor is the price tensor X, and the input of Critic is the combination of X and Action. That is to say, the input of Actor and Critic in DDPG are different. However, according to the doc of stable baselines3, I can not make different inputs when customizing Actor and Critic in DDPG algorithm. What should I do?
the input of Actor is the price tensor X, and the input of Critic is the combination of X and Action. That is to say, the input of Actor and Critic in DDPG are different.
well, that's the definition of the actor and the critic... and that's already the case for SB3. See TD3 (improved version of DDPG):
Thank you. Hope the doc of SB3 will add the tutorial section of customizing Actor and Critic soon.
Again, as mentioned in https://github.com/DLR-RM/stable-baselines3/issues/285, I'm not sure what is missing from the documentation:
Thank you for be patience to answer me. I mean that I want directly operate the input (observation) in the section of Custom Policy Network, that is to say, the obeservation should be the parameter of class CustomNetwork. If only like the doc shows that changing the number of neural layers and units, it is not enough to customize policy in SB3. You guys really did a great job to create SB3 which is attractive to me, I just want it be more flexible.
I mean that I want directly operate the input (observation) in the section of Custom Policy Network, that is to say, the obeservation should be the parameter of class CustomNetwork
I still don't get your point. Observation is passed to both the actor and the critic (and the critic get the action in addition). Both objects also have access to the observation and action spaces.
If only like the doc shows that changing the number of neural layers and units, it is not enough to customize policy in SB3.
If you want to modify the observation, then use a gym wrapper, if you want more flexibility, you have the feature extractor (did you take a look at the example?).
If you want to do something even more fancy (let say pass the action at different stage of the critic using residual connections), then please take a look at the developer guide and then the code (which is commented) and then derive a custom policy object.
I just want it be more flexible.
But at the end, the actor must output a valid action and the critic must output an action-value, so you cannot have too much flexibility anyway.
If you want to modify the observation, then use a gym wrapper, if you want more flexibility, you have the feature extractor (did you take a look at the example?).
It is really inspire me, I will have a try, thank you.
Question
When I follow the tutorial to customize Policy Network, I found the guide do not explain how to add current observation and action in Custom Policy Network.
Here is the code for example :
As the code shows above, x is the current observation when agent take action. However, according to the stable-baselines3's official guide, I can not find a way to insert current observation and action to Policy Network Is there any way to make the policy-customizing more flexible? I want to inset current observation and action in Custom Policy Network. Hope any person could give me an answer, thanks.
Checklist
[x] I have read the documentation (required)
[x] I have checked that there is no similar issue in the repo (required)