M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation. (arXiv:2401.17032v1 [cs.RO]) https://ift.tt/72yxFck
One of the most critical aspects of multimodal Reinforcement Learning (RL) is
the effective integration of different observation modalities. Having robust
and accurate representations derived from these modalities is key to enhancing
the robustness and sample efficiency of RL algorithms. However, learning
representations in RL settings for visuotactile data poses significant
challenges, particularly due to the high dimensionality of the data and the
complexity involved in correlating visual and tactile inputs with the dynamic
environment and task objectives. To address these challenges, we propose
Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL). Our
approach employs a novel multimodal self-supervised learning technique that
learns efficient representations and contributes to faster convergence of RL
algorithms. Our method is agnostic to the RL algorithm, thus enabling its
integration with any available RL algorithm. We evaluate M2CURL on the Tactile
Gym 2 simulator and we show that it significantly enhances the learning
efficiency in different manipulation tasks. This is evidenced by faster
convergence rates and higher cumulative rewards per episode, compared to
standard RL algorithms without our representation learning approach.
M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representation Learning for Robotic Manipulation. (arXiv:2401.17032v1 [cs.RO])
https://ift.tt/72yxFck
One of the most critical aspects of multimodal Reinforcement Learning (RL) is the effective integration of different observation modalities. Having robust and accurate representations derived from these modalities is key to enhancing the robustness and sample efficiency of RL algorithms. However, learning representations in RL settings for visuotactile data poses significant challenges, particularly due to the high dimensionality of the data and the complexity involved in correlating visual and tactile inputs with the dynamic environment and task objectives. To address these challenges, we propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL). Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms. Our method is agnostic to the RL algorithm, thus enabling its integration with any available RL algorithm. We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks. This is evidenced by faster convergence rates and higher cumulative rewards per episode, compared to standard RL algorithms without our representation learning approach.
via cs.RO updates on arXiv.org http://arxiv.org/