[Question] About the parameter update of CNN

liruiluo commented 1 year ago

❓ Question

This is a great job! However, I have a little doubt about the custom policy After reading the documentation of the module "custom policy", it seems that Cnn and Mlp are not directly connected together as a network, but two networks are spliced together If that's the case, it looks like the Cnn can't update the parameters because it can't be optimized by the reinforcement learning algorithm Or is it not what I thought?

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo README
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 1 year ago

If that's the case, it looks like the Cnn can't update the parameters because it can't be optimized by the reinforcement learning algorithm

Could you elaborate? I'm not sure to get your point.

(and please use a meaningful title for the issue)

liruiluo commented 1 year ago

If that's the case, it looks like the Cnn can't update the parameters because it can't be optimized by the reinforcement learning algorithm

Could you elaborate? I'm not sure to get your point.

(and please use a meaningful title for the issue)

Now we have a meaningful title，let's take reinforce algorithm as an example. What I mean is that if cnn and mlp are two independent networks instead of a network connected together, mlp can be updated, because it outputs actions, and it is easy to use reinforcement learning algorithms, such as reinforce, to calculate loss ( -log_prob * Gt), so as to calculate the gradient.

However, cnn will only output a visual representation, and this representation does not have a real value, (such as loss=(predict, real)), so the loss cannot be calculated. Thus the gradient cannot be calculated, resulting in the network cannot be updated.

On the contrary, when cnn and mlp are connected to a network, cnn and mlp can be updated together（the parameters of cnn and mlp can be updated directly through the reinforce gradient together）, and this problem does not exist.

araffin commented 1 year ago

On the contrary, when cnn and mlp are connected to a network, cnn and mlp can be updated together（the parameters of cnn and mlp can be updated directly through the reinforce gradient together）, and this problem does not exist.

yes and that's the case in SB3. CNN vs MLP policy in SB3 mostly refers to the "features extractor" that can be shared or not between actor/critic. As shown in the doc, each network is decomposed in two parts: features extractor + mlp (for CNN policy, just a linear layer by default, can be adjusted), both are learned.

DLR-RM / rl-baselines3-zoo

[Question] About the parameter update of CNN #332

❓ Question

Checklist