Closed BartekCupial closed 8 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
f69cf46
) 77.96% compared to head (e97c18a
) 77.96%.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Training results as well as comparison with Dungeons and Data APPO are present in docs docs/09-environment-integrations/nethack.md
. I've also created model card in hugging face https://huggingface.co/LLParallax/sample_factory_human_monk (its also linked in docs).
Overview
This PR introduces NetHack to Sample Factory. The code is based on three repositories:
Key Contributions
Easy installation and experiments
By adding NetHack to examples in Sample Factory my main goal is to improve reproducibility and allow for easy experimentation with NetHack as I found that there were many issues with installation and experiments. For example when trying to reproduce experiments in D&D repository Dockerfile required fixing moolib library and Cmake.txt file. Additionally since D&D repo implemented APPO from scratch (implementation details in RL matter) I found that just my moving to SF I've managed to increase the score from 2k to about 2.8k.
Additional metrics
Sample Factory supports logging of additional policy stats if any are found in
info["episode_extra_stats"]
. I've added wrappers which log blstats and additional auxiliary scores. Look atsf_examples/nethack/utils/task_rewards.py]
.render_mode=rgb_array
NLE natively doesn't support rgb_array. I've added rgb_array mode for rendering by using
RenderCharImagesWithNumpyWrapperV2
wrapper introduced by https://github.com/Miffyli/nle-sample-factory-baseline. By usingrgb_array
in enjoy we can save and examine episodes. Example: https://github.com/BartekCupial/sample-factory/assets/92169405/47884b73-beeb-4303-a72f-75d202aa87a8Evaluation
NetHack can have very long episodes (100k env steps) and additionally since the environment is highly stochastic we usually need a lot of evaluation episodes to measure the policy performance (usually 1024). I highly recommend using recently introduced
eval.py
since using enjoy would be really long.