Closed BartekCupial closed 7 months ago
I plan to add report with experiments in the next PR.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 77.83%. Comparing base (
6c3ee69
) to head (d2f14b8
).:exclamation: Current head d2f14b8 differs from pull request most recent head fd46d79. Consider uploading reports for the commit fd46d79 to get more accurate results
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Motivation
This architecture reaches much better returns from multiple reasons listed below, it is featured in recent paper (current SOTA). The architecture was first introduced in Scaling Laws for Imitation Learning in NetHack.
Credit: Jens Tulys https://github.com/jens321/
Architecture Details (from the paper)
We use two main architectures for all our experiments, one for the BC experiments and another for the RL experiments.
BC architecture. The NLD-AA dataset is comprised of ttyrec-formatted trajectories, which are 24 × 80 ASCII character and color grids (one for each) along with the cursor position. To encode these, we modify the architecture used in Hambro et al., resulting in the following: