Open NicolaBernini opened 5 years ago
The comparison between Human and current DRL alogs shows there is a huge difference in terms of samples efficiency (how many samples are needed to achieve a certain performance): humans learn way faster than current DRL Algos so there are many interesting scientific questions here:
Learning speed is an important limiting factor to overcome in order to be able to move DRL outside of the niche of games, in more realistic situations
To attain expert human-level performance on tasks such as Atari video games or chess, deep RL systems have required many orders of magnitude more training data than human experts themselves [22]. The critique is indeed applicable to the first wave of deep RL methods, reported beginning around 2013 (e.g., [25]). However, even in the short time since then, important innovations have occurred in deep RL research, which show how the sample efficiency of deep RL can be dramatically increased.
Source of slow learning
Can be framed as Exploration vs Exploitation
Learning via gradient based methods means smaller and smaller increments in order to be able to check every update does not “break” what has been learned
Goals
Furthermore, as there is no prior knowledge about the Loss Function Landscape and the State Space is too big to perform a detailed exploration, the preference is to go for continuous improvements so to avoid breaking anything, so the learning becomes greedier (exploration leads to an increasing risk of loss as the NN accumulates more) and slower
Inductive Bias represents initial assumptions about the pattern to be learned
Neural Networks are very general learning machines, in fact training them takes a lot of time / computational power
Deep Neural Networks is a class of Neural Networks relying on Hierarchical Structure Inductive Bias which makes them be effective in solving computer vision tasks
Sample efficiency refers to the amount of data required for a learning system to attain any chosen target level of performance.
Overview
Paper Readthrough related to the original paper
Reinforcement Learning, Fast and Slow
Index