facebookresearch / nle

The NetHack Learning Environment
Other
939 stars 113 forks source link

Possible memory leak when resetting the environment #337

Closed ngoodger closed 1 year ago

ngoodger commented 1 year ago

🐛 Bug

Possible memory leak when resetting environment

To Reproduce

Steps to reproduce the behavior:

Using the latest nle==0.8.1

Run the following:

import nle
import gym

env = gym.make("NetHackChallenge-v0")
for i in range(1_000_000_000):
    obsv = env.reset()

The memory used by the process keeps increasing as long as it keeps running. Tested the same configuration with "CartPole-v0" and the memory remains static.

Tested and same behavior on:

  1. Ubuntu20:04
  2. Ubuntu20:04 from WSL (different machine)
  3. Colab using the setup script
    def colab_setup():
    # install prerequisites for nle
    !sudo apt-get install -y build-essential autoconf libtool pkg-config \
        python3-dev python3-pip python3-numpy git libncurses5-dev \
        libzmq3-dev flex bison
    !git clone https://github.com/google/flatbuffers.git
    !cd flatbuffers && cmake -G "Unix Makefiles" && make -j2 && sudo make install
    !pip install cmake==3.15.3
    !pip install nle pyvirtualdisplay transformers aicrowd_gym line_profiler
    colab_setup()

    Expected behavior

I expect that the memory usage for the process would remain static or at least close to static?

Environment

From Ubuntu20:04 as this was the cleanest environment

Collecting environment information... NLE version: 0.8.1 PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 CMake version: version 3.24.0

Python version: 3.8 Is CUDA available: N/A CUDA runtime version: Could not collect GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti Nvidia driver version: 471.41 cuDNN version: Could not collect

Versions of relevant libraries: [pip3] numpy==1.17.4 [conda] Could not collec

Additional context

I tried using tracemalloc to find the issue but there wasn't any memory that was tracked by the tool that increased with increased runtime/resets.