Hi, I am interested in this work. In the issues #5, you mentioned that the pre-trained R3M model simply acts as an encoder mapping images to embeddings. My question is how to use the whole framework in the downstream robotic task after behavior cloning? Just given an image, then the robot can do the imitated work? How about in a more sophisticated environment? How can the robot achieve a certain task from man without language?
Hi, I am interested in this work. In the issues #5, you mentioned that the pre-trained R3M model simply acts as an encoder mapping images to embeddings. My question is how to use the whole framework in the downstream robotic task after behavior cloning? Just given an image, then the robot can do the imitated work? How about in a more sophisticated environment? How can the robot achieve a certain task from man without language?