alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
811 stars 109 forks source link

Add nethack to sf_examples #289

Closed BartekCupial closed 8 months ago

BartekCupial commented 9 months ago

Overview

This PR introduces NetHack to Sample Factory. The code is based on three repositories:

Key Contributions

Easy installation and experiments

By adding NetHack to examples in Sample Factory my main goal is to improve reproducibility and allow for easy experimentation with NetHack as I found that there were many issues with installation and experiments. For example when trying to reproduce experiments in D&D repository Dockerfile required fixing moolib library and Cmake.txt file. Additionally since D&D repo implemented APPO from scratch (implementation details in RL matter) I found that just my moving to SF I've managed to increase the score from 2k to about 2.8k.

Additional metrics

Sample Factory supports logging of additional policy stats if any are found in info["episode_extra_stats"]. I've added wrappers which log blstats and additional auxiliary scores. Look at sf_examples/nethack/utils/task_rewards.py].

render_mode=rgb_array

NLE natively doesn't support rgb_array. I've added rgb_array mode for rendering by using RenderCharImagesWithNumpyWrapperV2 wrapper introduced by https://github.com/Miffyli/nle-sample-factory-baseline. By using rgb_array in enjoy we can save and examine episodes. Example: https://github.com/BartekCupial/sample-factory/assets/92169405/47884b73-beeb-4303-a72f-75d202aa87a8

Evaluation

NetHack can have very long episodes (100k env steps) and additionally since the environment is highly stochastic we usually need a lot of evaluation episodes to measure the policy performance (usually 1024). I highly recommend using recently introduced eval.py since using enjoy would be really long.

codecov-commenter commented 9 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (f69cf46) 77.96% compared to head (e97c18a) 77.96%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #289 +/- ## ======================================= Coverage 77.96% 77.96% ======================================= Files 101 101 Lines 7759 7759 ======================================= Hits 6049 6049 Misses 1710 1710 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

BartekCupial commented 9 months ago

UPDATE:

Training results as well as comparison with Dungeons and Data APPO are present in docs docs/09-environment-integrations/nethack.md. I've also created model card in hugging face https://huggingface.co/LLParallax/sample_factory_human_monk (its also linked in docs).