arXiv 2021 | Training Larger Networks for Deep Reinforcement Learning

chufanchen commented 5 months ago

It has been reported in some studies that deep RL agents experience instability while training with larger networks. This is contrary to our intuition considering the recent progress on solving computer vision tasks such ViT: larger and more complex network architectures have proven to achieve better performance.

Sutton identifies a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three properties are combined, learning can be unstable, and potentially diverge with the value estimates becoming unbounded. Some prior works have challenged to mitigate this problem including

target network
double Q-learning
n-step learning

This paper try to mitigate this problem related to function approximation. Previous work(e.g. make MLP/CNN network larger) concluded the larger networks tend to perform better, but also become unstable and prone to diverge more. For on-policy methods, too small or large networks can cause significant drop in performance of the policy.

To build a large network,

unsupervised learning is used to learn powerful representations for downstream tasks(in NLP, CV)
learn auxiliary tasks(e.g. learn the dynamics) to improve the sample efficiency
learn a good representation that produces low dimensional features for the state-input setting
use of online feature extractor network (OFENet) that intentionally increases input dimensionality, and improve RL performance on both sample efficiency and control performance

chufanchen commented 5 months ago

Methods

Decoupling Representation Learning from RL: OFENet+auxiliary task $\rightarrow$ decoupling unsupervised pretraining from downstream task
Distributed Training: Ape-X $\rightarrow$ mitigate overfitting and rank collapse issues of Q-networks
Network Architectures: DenseNet $\rightarrow$ improved flow of information and gradients throughout the network

chufanchen / read-paper-and-code

arXiv 2021 | Training Larger Networks for Deep Reinforcement Learning #151

Methods