Training reward models, subtask detectors and ESC policies from scratch is probably quite inefficient.
It would be better if we could base all of these on a general model backbone that is capable in the MineRL domain.
Luckily the VPT models are exactly that.
A first step could be to make use of the ImpalaCNN which is the first component of the MineRLAgent policy.
Good representations of long term dependencies could also benefit from using the embeddings of the transformer decoder blocks.
Task: Implement custom modules that only load relevant parts of the VPT model for use in
Training reward models, subtask detectors and ESC policies from scratch is probably quite inefficient. It would be better if we could base all of these on a general model backbone that is capable in the MineRL domain. Luckily the VPT models are exactly that. A first step could be to make use of the
ImpalaCNN
which is the first component of the MineRLAgent policy. Good representations of long term dependencies could also benefit from using the embeddings of the transformer decoder blocks.Task: Implement custom modules that only load relevant parts of the VPT model for use in