Closed chufanchen closed 5 months ago
It has been reported in some studies that deep RL agents experience instability while training with larger networks. This is contrary to our intuition considering the recent progress on solving computer vision tasks such ViT: larger and more complex network architectures have proven to achieve better performance.
Sutton identifies a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three properties are combined, learning can be unstable, and potentially diverge with the value estimates becoming unbounded. Some prior works have challenged to mitigate this problem including
This paper try to mitigate this problem related to function approximation. Previous work(e.g. make MLP/CNN network larger) concluded the larger networks tend to perform better, but also become unstable and prone to diverge more. For on-policy methods, too small or large networks can cause significant drop in performance of the policy.
To build a large network,
https://arxiv.org/abs/2102.07920