Implement custom models on top of VPT models

Training reward models, subtask detectors and ESC policies from scratch is probably quite inefficient. It would be better if we could base all of these on a general model backbone that is capable in the MineRL domain. Luckily the VPT models are exactly that. A first step could be to make use of the ImpalaCNN which is the first component of the MineRLAgent policy. Good representations of long term dependencies could also benefit from using the embeddings of the transformer decoder blocks.

Task: Implement custom modules that only load relevant parts of the VPT model for use in

reward model (regression)
ESC policy (binary classiciation / logistic regression)
subtask detectors (multiclass classiciation?)

BASALT-2022-Karlsruhe / ka-basalt-2022

Implement custom models on top of VPT models #83