bytedance / GR-1

Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
Apache License 2.0
95 stars 3 forks source link

Question about the Random_shift_augmentation in training. #9

Open YTEP-ZHI opened 2 months ago

YTEP-ZHI commented 2 months ago

Hi @bdrhtw, thanks for open-sourcing this great work. I have a question about the Random_shift_augmentation in training. Is this augmentation only applied during the video prediction stage, or is it applied in both the video prediction and policy learning stages? If it is also used for policy learning, will it cause a misalignment between the pre-recorded actions and the augmented visual observations? Looking forward to your reply.

mbreuss commented 2 months ago

Using random shift is common specifically for policy learning. You want to make your policy more robust against changes in the viewpoint. Although the shifted part of the image is different every time the relative position of the robot to the objects and the environment remains the same and thats the relevant part. This augmentation helps to prevent overfitting and improves your policy.

YTEP-ZHI commented 2 months ago

@mbreuss Thanks for your answer!