eugenevinitsky / sequential_social_dilemma_games

Repo for reproduction of sequential social dilemmas
MIT License
384 stars 134 forks source link

train baseline - memory consumption increases with iterations #157

Open DaoudPiracha opened 5 years ago

DaoudPiracha commented 5 years ago

Upon running the train_baseline command in python3, the workers seems to consume RAM memory with each iteration without properly freeing it up.

On some runs, I have seen a memory consumption increase of 40Mb/iterations, which when scaled to e.g 10000 iterations, becomes 400 GB memory. Since this is in RAM, it makes longer experiments impossible to run.

Additionally, please note that this seems to be different from object store in ray as upon termination, object store size was considerably small ~ 20 MB. However, each worker.__PolicyEvaluator() had ~2GB storage allocated with multiple such workers present.

eugenevinitsky commented 5 years ago

Oh, that's bad! Are you actually seeing this fail as a result? Normally I see a RAM increase but then eventually RLlib somehow clears it up. However, this seems like more of a RLlib issue than an issue with this library (I suspect). Would you mind reposting this in their github issues?

DaoudPiracha commented 5 years ago

Yes. Unfortunately, it typically fails on most longer runs. I'll repost on the RLLib Github as well.

I'm currently getting this issue after simply cloning the current repo and running train_baseline.

eugenevinitsky commented 5 years ago

Hi, that's really good to know! Thank you for updating us on this. I'll examine it as well when I get a chance, but I'm suspicious it's an rllib issue rather than something on our end. I don't think there's any memory that's persisted across environment rollouts.

DaoudPiracha commented 5 years ago

Sounds good. For now, could you possibly share your current environment/setup, where you have RLLib clearing up storage automatically. Possibly as a docker container?