Closed LemonWade closed 11 months ago
Hi @LemonWade,
Thanks for your interest in our work and kind words.
Your confusion is valid. If 36 is the first keyframe, then frame 40 should ideally be timestep 2. Could you point to the code where you find this issue? This could be something borrowed from PerAct's dataloder that I donot remember of the top of my head.
Best, Ankit
Let's look at the following function in theutils.dataset.py
file:
def _add_keypoints_to_replay(
...
next_keypoint_idx: int,
...
):
prev_action = None
obs = inital_obs
for k in range(
next_keypoint_idx, len(episode_keypoints)
): # confused here, it seems that there are many similar samples in the replay
keypoint = episode_keypoints[k]
obs_tp1 = demo[keypoint]
obs_tm1 = demo[max(0, keypoint - 1)]
(
trans_indicies,
rot_grip_indicies,
ignore_collisions,
action,
attention_coordinates,
) = _get_action(
obs_tp1,
obs_tm1,
rlbench_scene_bounds,
voxel_sizes,
rotation_resolution,
crop_augmentation,
)
...
obs_dict = extract_obs(
obs,
CAMERAS,
t=k - next_keypoint_idx,
prev_action=prev_action,
episode_length=25,
)
As we can see,
for k in range(
next_keypoint_idx, len(episode_keypoints)
):
and t=k - next_keypoint_idx
, it implies that every time we run this function, the first data saved will always have t
equal to 0.
This means that when we cross the 0th keyframe and choose the 1st keyframe as next_keypoint_idx, the t
for the first data collected at this point will be reset to 0."
Then t
, after being processed by the extract_obs function, is transformed into a number between [-1, 1], where 0 corresponds to time
being 1
, and this is stored in low_dim_state
. Later, it will go through embedding and enter the model.
Hi @LemonWade,
Great question. Unfortunately, I do not have a very good answer for it. What I would say is that this setup is the same as what is done in PerAct (https://github.com/peract/peract/blob/b73cf504567122f81912964943a16cd83920e4d9/agents/peract_bc/launch_utils.py#L265-L273) and we keep everything in the RLBench dataset the same for fairness.
For the real world, we did as you were mentioning and as mentioned here: https://github.com/NVlabs/RVT/issues/33#issuecomment-1823685357
For RLBench, I would recommend you to try the alternative, i.e. time step does not become zero, and see if that helps. I would be very interested in the results :)
Best, Ankit
Thank you for your answer. I will take it seriously, and if there are any interesting findings, I will reopen this issue.
Hi @imankgoyal
Firstly, I would like to express my gratitude for your exceptional work. It's truly inspiring and helpful.
I have a question related to data handling, specifically regarding keyframes. Let's consider a scenario where there are three keyframes, and the first keyframe is at frame 36. When collecting data for frames
10, 20, and 30,
it appears that thesethree keyframes are recollected,
and thetimestep uniformly decreases
. Moreover, in thelow_dim_state
, frames 10, 20, and 30 all seem to be treated as the starting point of the event,with the timestep set to 1 for each
.However, I'm a bit puzzled about the handling of frame 40.
Why is the timestep reset to 1 at this point?
Additionally, during the evaluation process, I haven't noticed any reset of the timestep. In my opinion, it might be more consistent to disregard the timestep. What are your thoughts on this approach? Any clarification or insight you could provide would be greatly appreciated.Thank you for your time and consideration.
Best regards, LemonWade