Questions about the action token and inference process

I have some questions: For the question 1 and 2 (line 168 on here)

    transformer_out = transformer_out.reshape(original_shape)
    action_token_out = transformer_out[:, :, 0, :]
    if per_step:
        action_token_out = action_token_out[:, -1:, :]

Is there any reason you only use index 0 of transformer output?
During inference, you why do you take -1: index? Why do you set different setting for inference?
In your paper, you mentioned action token is used for the input, but I cannot find code where you used action token as input. Can you show where the corresponding code exists?
Can you explain why TensorUtils.time_distributed is used on this line?
During inference, is there a reason why do post-processing for gripper-history?

Thank you in advance!

UT-Austin-RPL / VIOLA

Questions about the action token and inference process #3