Closed wzcai99 closed 1 year ago
@wzcai99
Hmm... maybe you are looking at the video-interpolation frames in the dataset. Are you sure the dataset trajectory never reaches the same angle as the simulator, eventually? From FAQ
The Full Dataset contains extracted Resnet features for each frame in ['images'] which include filler frames inbetween each low-action (used to generate smooth videos), whereas Modeling Quickstart only contains features for each low_idx that correspond to frames after taking each low-level action.
@wzcai99
Hmm... maybe you are looking at the video-interpolation frames in the dataset. Are you sure the dataset trajectory never reaches the same angle as the simulator, eventually? From FAQ
The Full Dataset contains extracted Resnet features for each frame in ['images'] which include filler frames inbetween each low-action (used to generate smooth videos), whereas Modeling Quickstart only contains features for each low_idx that correspond to frames after taking each low-level action.
The dataset trajectory reaches the same angle as the simulator eventually.
Well, I want to replace the ResNet Features with other backbones, so, I use the Full Dataset.
But as the dataset contains the interpolated images, to train a policy, I need to manually remove those interpolated images?
And I recheck the json file in the dataset, does the maximum image['low_idx'] represents the episode length?
@wzcai99, yes the low_idx
is for all frames (including interpolation frames for video), and high_idx
is for task-planning level actions like MoveAhead
etc. (if I remember correctly).
I'm also trying to use a different feature extractor. Do we have any easy way to figure out which frames are interpolated to remove them?
@vlongle, from my perspective, the dataset JSON file contains a list with image data, with the image name and low_idx. I enumerate the entire list and only keep the first image for those redundant images with the same low_idx. As I use PyTorch for training, I only need to enumerate the dataset once at initializing the dataloader, and it won't influence the training speed.
I tried to verify the performance of the expert demonstrations in the downloaded dataset, but I found the turning angle of the rotate action is different in the dataset. In the simulation, after the agent executes a rotation action, it turns about 30 degrees but in the dataset, two subsequent images seem to be close. For example, at the start of an episode, the initial rgb images are shown like this: And the next image in the dataset is shown like this: But in the simulations, it appears to be different like this
I was wondering which part I might go wrong.