Closed liuzuxin closed 1 year ago
Thanks for your interest in our work!
Thanks again for reaching out and sorry for the wait. We are still developing the codebase.
Thanks for your reply! Looking forward to the release.
Regarding 2, do we need to evaluate each task by traversing over all 50 initial states to get an unbiased estimation? Or alternatively, using the same subset (such as 20 in your example evaluate.py
code for evaluation is also fair for evaluation?
Regarding 3, I am still a little confused about how to get the succ. rate, is it one of the AUC, FWT, NBT metrics proposed in the paper? Do you also apply sequential training for the pretrained and w/o pretrained models?
Lastly, I am. also curious how does the maximum timestep 600
was selected? Is this value correspond to the maximum demonstration's length? Can we use smaller ones to accelerate evaluation?
Thanks for the quick response!
Hi @Cranial-XIX , I noticed that in the dataset.py, you mentioned that frame_stack
is used instead of seq_len
. However, in the main.py
, seq_len
arguments are set to 10 while frame_stack
is set to 1. Would you mind clarifying the padding difference between them, as mentioned in the comment, and which one we should use? Thanks!
Thanks for asking. Based on my understanding, they mainly differ in the padding style (if you set the padding like pad_frame_stack
or pad_seq_length
to True
). Please see here. Assume your sequence is [0,1,2,3,4,5]
, setting seq_len=5
and frame_stack=1
with both padding will start at [0,1,2,3,4]
and end in [5,x,x,x,x]
where x
means a zero-padding frame. Setting frame_stack=5
and seq_len=1
will start at [x,x,x,x,0]
and end in [1,2,3,4,5]
.
But note that in practice, when we do rollout, we start from index 0
(no padding, because both LSTM and Transformer are able to deal with dynamic length input). So we will remove that comment. Thanks for catching that.
Hi @Cranial-XIX , thanks for you efforts and wonderful work. Have you provided the checkpoints right now? I didn.t find the corresponding files. Best.
Hi @Cranial-XIX @zhuyifengzju , thanks for the great work! Impressive! I have a few questions:
spatial, goal, object
dataset contains 10 tasks defined by 10 language instructions. Each task contains 50 fixed initial states, and each initial state has its corresponding demonstrations in the dataset, right?LIBERO_10
tasks after performing full fine-tuning onLIBERO_10
? It is really unclear to me what are the settings for the results and the methods (particularlyw/o pretraining
andmultitask
) in Figure 3. Also, I am curious whether there are any intuition or insights why pretraining onLIBERO_90
do not work well.LIBERO_90
and then testing onLIBERO_90
?Again, I am very interested in this work and feedback would be highly appreciated. Thanks in advance.