Is there any reason you only use index 0 of transformer output?
During inference, you why do you take -1: index? Why do you set different setting for inference?
In your paper, you mentioned action token is used for the input, but I cannot find code where you used action token as input. Can you show where the corresponding code exists?
Can you explain why TensorUtils.time_distributed is used on this line?
During inference, is there a reason why do post-processing for gripper-history?
I have some questions: For the question 1 and 2 (line 168 on here)
TensorUtils.time_distributed
is used on this line?gripper-history
?Thank you in advance!