SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling. (arXiv:2306.11886v3 [cs.RO] UPDATED) https://ift.tt/jwTcepI
Pre-training robot policies with a rich set of skills can substantially
accelerate the learning of downstream tasks. Prior works have defined
pre-training tasks via natural language instructions, but doing so requires
tedious human annotation of hundreds of thousands of instructions. Thus, we
propose SPRINT, a scalable offline policy pre-training approach which
substantially reduces the human effort needed for pre-training a diverse set of
skills. Our method uses two core ideas to automatically expand a base set of
pre-training tasks: instruction relabeling via large language models and
cross-trajectory skill chaining through offline reinforcement learning. As a
result, SPRINT pre-training equips robots with a much richer repertoire of
skills. Experimental results in a household simulator and on a real robot
kitchen manipulation task show that SPRINT leads to substantially faster
learning of new long-horizon tasks than previous pre-training approaches.
Website at https://ift.tt/XMFzREx.
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling. (arXiv:2306.11886v3 [cs.RO] UPDATED)
https://ift.tt/jwTcepI
Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://ift.tt/XMFzREx.
via cs.RO updates on arXiv.org http://arxiv.org/