NetHack: add new model - Githubissues

BartekCupial commented 7 months ago

Motivation

This architecture reaches much better returns from multiple reasons listed below, it is featured in recent paper (current SOTA). The architecture was first introduced in Scaling Laws for Imitation Learning in NetHack.

Credit: Jens Tulys https://github.com/jens321/

Architecture Details (from the paper)

We use two main architectures for all our experiments, one for the BC experiments and another for the RL experiments.

BC architecture. The NLD-AA dataset is comprised of ttyrec-formatted trajectories, which are 24 × 80 ASCII character and color grids (one for each) along with the cursor position. To encode these, we modify the architecture used in Hambro et al., resulting in the following:

Dungeon encoder. This component encodes the main observation in the game, which is a 21 × 80 grid per time step. Note the top row and bottom two rows are cut off as those are fed into the message and bottom line statistics encoder, respectively. We embed each character and color in an embedding lookup table, after which we concatenate them and put them in their respective positions in the grid. We then feed this embedded grid into a ResNet, which consists of 2 identical modules, each using 1 convolutional layer followed by a max pooling layer and 2 residual blocks (of 2 convolutional layers each), for a total of 10 convolutional layers, closely following the setup in Espeholt et al.
Message encoder. The message encoder takes the top row of the grid, converts all ASCII characters into a one-hot vector, and concatenates these, resulting in a 80 × 256 = 20, 480 dimensional vector representing the message. This vector is then fed into a 2-layer MLP, resulting in the message representation.
Bottom line statistics. To encode the bottom line statistics, we flatten the bottom two rows of the grid and create a “character-normalized" (subtract 32 and divide by 96) and “digits-normalized" (subtract 47 and divide by 10, mask out ASCII characters smaller than 45 or larger than 58) input representation, which we then stack, resulting in a 160 × 2 dimensional input. This closely follows the Sample Factory3 model used in Hambro et al.

BartekCupial commented 7 months ago

I plan to add report with experiments in the next PR.

codecov-commenter commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.83%. Comparing base (6c3ee69) to head (d2f14b8).

:exclamation: Current head d2f14b8 differs from pull request most recent head fd46d79. Consider uploading reports for the commit fd46d79 to get more accurate results

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #295 +/- ## ======================================= Coverage 77.83% 77.83% ======================================= Files 101 101 Lines 7773 7773 ======================================= Hits 6050 6050 Misses 1723 1723 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alex-petrenko / sample-factory

NetHack: add new model #295

Motivation

Architecture Details (from the paper)

Codecov Report