Open NickLucche opened 2 months ago
Hello,
why would you do instead of using a callback for instance?
I'm also wondering why you would recreate the environment every time instead of just calling learn(..., reset_num_timesteps=False)
(see our doc)?
Does the higher memory usage happened also when using a DummyVecEnv
?
Code is structured that way because my (actual) environment depends on some initial seed/state, which I can use to simulate ~"unseen" data and test generalization. It's then very straightforward to re-use regular training script to train on different ""splits""/conditions i.e just calling train
in a loop like that.
But I believe use-case is of secondary importance here if there's some actual unreleased resource we could address (assuming there's no blunt mistake on my side).
Does the higher memory usage happened also when using a DummyVecEnv ?
Yep, still happens even when
vec_env = make_vec_env(make_env, n_envs=12)
Decreased obs space to 256x256x2 to better highlight ramp-up before my system's OOM:
depends on some initial seed/state, which I can use to simulate ~"unseen" data and test generalization
.reset(seed=...)
is made for that normally (.seed()
for VecEnv
and then do a reset)
🐛 Bug
Hey, thanks a lot for your work! I am trying to debug an apparent memory leak/higher memory usage when running the training code multiple times, but I can't pinpoint its cause. I've boiled down my problem to the snippet below. Basically when starting sequential training runs I get a higher memory consumption than a single one, when I would expect all resources to be released after
PPO
object is collected. I believe the only real difference in this example is the obs and action space, which mimics my use case.Single run memory usage
model.learn(total_timesteps=500_000)
Multi run memory usage
model.learn(total_timesteps=25_000)
N times. Crashes early due to OOM.To Reproduce
Relevant log output / Error message
No response
System Info
Checklist