...Specifically, we consider the setting where there is a very large (maybe infinite)
set of tasks, and each task has many instantiations. For example, a task could be
to stack all blocks on a table into a single tower, another task could be to place
all blocks on a table into two-block towers, etc. In each case, different instances
of the task would consist of different sets of blocks with different initial states.
At training time, our algorithm is presented with pairs of demonstrations for a
subset of all tasks. A neural net is trained such that when it takes as input the first
demonstration demonstration and a state sampled from the second demonstration,
it should predict the action corresponding to the sampled state. At test time, a full
demonstration of a single instance of a new task is presented, and the neural net
is expected to perform well on new instances of this new task. Our experiments
show that the use of soft attention allows the model to generalize to conditions and
tasks unseen in the training data. We anticipate that by training this model on a
much greater variety of tasks and settings, we will obtain a general system that can
turn any demonstrations into robust policies that can accomplish an overwhelming
variety of tasks.
Motivation
At a high level this is effectively what is implemented by the VisualReplayStrategy.
Feature request
This task involves reading this paper in more detail and identifying additional tricks we may be able to use.
https://arxiv.org/abs/1703.07326
Motivation
At a high level this is effectively what is implemented by the VisualReplayStrategy.