The output of the original GR1: (s1, a1, s2, a2, s3, a3, s4, a4, ...)
The output of GR-Chunk: (s1, a1, a2, ..., aN, s2, a2, a3, ..., a_(N+1), s3, ...)
That's the only difference in the implementation.
As for why the performance will boost, there may be 2 possibilities:
It reduces the multimodality of the whole problem. There are many possible options when the policy need to choose a single action, but there will be fewer possible options when it need to pick a future trajectory (you can see from the following 2D navigation example). This explanation comes from a nice developer in LeRobot community, but sorry I cannot recall his/her name...
This approach simply helps the model the build an implicit world model.
Hi @fangqi-Zhu,
The output of the original GR1: (s1, a1, s2, a2, s3, a3, s4, a4, ...) The output of GR-Chunk: (s1, a1, a2, ..., aN, s2, a2, a3, ..., a_(N+1), s3, ...)
That's the only difference in the implementation.
As for why the performance will boost, there may be 2 possibilities: