facebookresearch / r3m

Pre-training Reusable Representations for Robotic Manipulation Using Diverse Human Video Data
https://sites.google.com/view/robot-r3m/
MIT License
292 stars 45 forks source link

How to apply R3M to downstream robotic task? #5

Closed yquantao closed 2 years ago

yquantao commented 2 years ago

Hi,

in the paper, it was mentioned that when we apply R3M to downstream robot learning tasks, only a few demonstrations are needed. My question is that do we also need to add language in these demos? Or only video demos are needed?

suraj-nair-1 commented 2 years ago

No language is needed (only used during training), the pre-trained R3M model simply acts as an encoder mapping images to embeddings. So to train with imitation learning you just need data of (Image, action) pairs, and you can encode the images with R3M and train with your normal imitation learning loss. You can see an example of encoding a single image here

Best, Suraj

whc688 commented 1 year ago

@suraj-nair-1 Hi! If i want to use frozen pre-trained R3M as an encoder to encode image, is there any kind of image normalization i need to apply on the image? and what should i apply on it? Thanks a lot!