alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
811 stars 109 forks source link

NetHack: add new model #295

Closed BartekCupial closed 7 months ago

BartekCupial commented 7 months ago

Motivation

This architecture reaches much better returns from multiple reasons listed below, it is featured in recent paper (current SOTA). The architecture was first introduced in Scaling Laws for Imitation Learning in NetHack.

Credit: Jens Tulys https://github.com/jens321/

Architecture Details (from the paper)

We use two main architectures for all our experiments, one for the BC experiments and another for the RL experiments.

BC architecture. The NLD-AA dataset is comprised of ttyrec-formatted trajectories, which are 24 × 80 ASCII character and color grids (one for each) along with the cursor position. To encode these, we modify the architecture used in Hambro et al., resulting in the following:

BartekCupial commented 7 months ago

I plan to add report with experiments in the next PR.

codecov-commenter commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.83%. Comparing base (6c3ee69) to head (d2f14b8).

:exclamation: Current head d2f14b8 differs from pull request most recent head fd46d79. Consider uploading reports for the commit fd46d79 to get more accurate results

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #295 +/- ## ======================================= Coverage 77.83% 77.83% ======================================= Files 101 101 Lines 7773 7773 ======================================= Hits 6050 6050 Misses 1723 1723 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.