Closed yquantao closed 2 years ago
No language is needed (only used during training), the pre-trained R3M model simply acts as an encoder mapping images to embeddings. So to train with imitation learning you just need data of (Image, action) pairs, and you can encode the images with R3M and train with your normal imitation learning loss. You can see an example of encoding a single image here
Best, Suraj
@suraj-nair-1 Hi! If i want to use frozen pre-trained R3M as an encoder to encode image, is there any kind of image normalization i need to apply on the image? and what should i apply on it? Thanks a lot!
Hi,
in the paper, it was mentioned that when we apply R3M to downstream robot learning tasks, only a few demonstrations are needed. My question is that do we also need to add language in these demos? Or only video demos are needed?