alex-petrenko / sample-factory

High throughput synchronous and asynchronous reinforcement learning
https://samplefactory.dev
MIT License
800 stars 109 forks source link

Memory leaks #116

Closed frankie4fingers closed 2 years ago

frankie4fingers commented 2 years ago

Hello, I try to use SF with custom gym env with observation size 360. I used lots of env's - here is my config:

  cfg.num_workers = 16
  cfg.num_envs_per_worker = 256
  cfg.num_batches_per_iteration = 32
  cfg.traj_buffers_excess_ratio=1.
  cfg.learning_rate = 1e-4
  cfg.ppo_clip_ratio = .2
  cfg.rollout = 128
  cfg.recurrence = 16
  cfg.ppo_epochs = 1
  cfg.batch_size = cfg.num_workers * cfg.num_envs_per_worker * cfg.rollout // cfg.num_batches_per_iteration
  cfg.with_vtrace = False
  cfg.num_minibatches_to_accumulate = -1
  cfg.gamma = 0.999

With 4096 envs on Ubuntu 20.04 I see memory SF consumption - additional 1gb per 5-10 mins, so my 32gb is not enough for 1hr of training. I tried to investigate possible memory leak in SF with pympler but for now without success. So could you please check it. I also use custom encoder. My throughoutput 50k samples.

alex-petrenko commented 2 years ago

Hey, thank you for reporting this! SF itself is usually not known for memory leaks, but your environment might be leaking. Or perhaps one of the Python dependencies has a bug that causes a leak - could be PyTorch for instance.

As the first step of diagnostic, can you please provide a screenshot, or send over your tensorboard summaries? One of the summary groups provides information about the memory usage of different components. Specifically, there are actor workers, policy workers, and one learner. If you see a steady leak in actor worker RAM usage, good chance it is your environment. If the leak is in policy workers or the learner, it is more likely that it is something specific to SampleFactory or your custom encoder.

If you indeed find that actor workers are leaking, I recommend you running the sampler (just an experience collection loop with a random policy). If it's your environment, you should still see the leak and it will be easier to diagnose. See here: https://github.com/alex-petrenko/sample-factory#dummy-sampler

frankie4fingers commented 2 years ago

Thanks, it helps me to find the core of the problem - you are right it's my custom env - specifically numba generated code with nojit option.

Could you please help me with another question: Inside APPO I found that for usual PPO mode we need to setup num_workers * num_envs_per_worker * rollout == num_batches_per_iteration * batch_size and num_minibatches_to_accumulate=0 but in this case SF wait last chunk of data forewer in loop with debug info: Waiting for 128 trajectory buffers... (see my cfg above). I just try to reproduce Dota 2 paper with 32 SGD steps per large batch and async expirience runners but with my custom env. I found that DDPPO settings with 2 large batches and 2 epochs not works at all even if LR is adjasted by sqrt(batch size/256). So in my case 2 batch with 32 epochs works really well, but I think that in this case it's more like overfitting to data, so I want to try with 16 mini batches and 2 epochs. And just to make sure that everything works fine I try to start from simple PPO without async feature for now.