NAMD / pypelinin

Python library to distribute jobs and pipelines among a cluster
3 stars 5 forks source link

Reduce memory consumption of daemons and client #41

Open turicas opened 11 years ago

turicas commented 11 years ago

Some users have reported that the daemons are consuming too much memory, even when there are no jobs (memory is not freed).

I think the problem is worse in Broker and PipelineManager, but we don't know yet what is causing this. Need to do some benchmarks and optimizations in the code.

Related to #39.

turicas commented 11 years ago

Some links that could help:

turicas commented 11 years ago

Probably worker processes on Broker need to be "refreshed" (killed and then re-started again) after execution of X jobs. As Python does not free memory for the operating system regularly when you destroy objects, the best way to do it is to kill the process.

Some time ago, all jobs were executed in fresh worker processes (see https://github.com/NAMD/pypln.backend/commit/19aa104a38f0b2365a3a14441887a5efa27284ff), then we shifted to this new approach: long-running worker process (they start when Broker starts and are killed when Broker is killed). Maybe we need a solution in the middle of these two.

turicas commented 11 years ago

These projects may help:

fccoelho commented 11 years ago

@turicas , See this, straight from the multiprocessing documentation:

New in version 2.7: maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

turicas commented 11 years ago

@fccoelho, thanks! Currently I'm not using multiprocessing.Pool (I've created my own Pool class) but I'll read the documentation carefully to decide if it's better to change to it or not.