Closed mhyatt000 closed 1 year ago
Thank you for your clear explanation of the problem you are encountering.
We actually do something very similar to your proposed solution in the FrameStackWrapper, which is what is used at test-time during BC-Transformer rollouts.
Tagging @snasiriany @MBronars for visibility and in case they would like to comment further on this.
@amandlek Thanks for your reply! I was unaware of FrameStackWrapper.
I was planning to write in the following options: (for my own experiments)
Does RoboMimic already support these things? I was not aware of them when reading the documentation.
Re: "training on many data sources simultaneously" - we plan to support this in the next version! Re: "multi-gpu / multi-node training" - this is not supported at the moment Re: "hindsight relabeling of goals (and packed hindsight relabeling)" - this is not supported at the moment either
Hello, this is my first issue so apologies if I missed something.
I have been training with the
algo/bc.BC-Transformer
and comparing the performance with BC RNN. However, while the transformer is trained in parallel, during inference time is autoregressive since the agent must interact with the RoboSuite environment. ie:a0 = f(o0) a0,a1 = f(o0,o1) ...
problem
As it stands, the
bc.py
algorithms have no mechanism to use multiple observations with theget_action
method. While this is not a problem for BC_RNN (as it maintains a hidden state reset after the episode), the BC_transformer does not have this hidden state and needs all the observations in order to accurately predict the next action in the sequence. Otherwise the model is not performing sequence modeling.Further, in
models/obs_nets.MIMO_Transformer
the transformer model that sees only one observation, but expectscfg.context_length
observations is broadcast across all timesteps inmodels/obs_nets.MIMO_Transformer.input_embedding()
in my experiments: tensor.shape(1,512) + tensor.shape(10,512) = tensor.shape(10,512)
solution
I have fixed this problem by retaining observations in a buffer and selecting the last n observations for the rollout. The observations are padded in the event that the agent has not yet experienced
cfg.context_length
observations. Rather than selecting the last observation in the sequence, the correct observation (relative to padding) is selected. This is sometimes the last observation. At the end of the rollout episode the buffer is cleared.If you like these changes, please let me know how I can incorporate them. I am planning to use RoboMimic in the future and would be interested to speak with one of the maintainers about contributing further improvements.