Multiprocessing is slow

danuker commented 3 years ago

Multiprocessing is very slow, compared to evolving one generation.

One 50-individual generation evolves in around 10ms, whereas starting a new process is 100-200 ms.

Still, in benchmarks, small generations with frequent feedback are what work best.

There is no obvious way to parallelize, because the GIL would block the CPU-intensive threads.

Think about what can be done.

danuker commented 3 years ago

Posted on Reddit. ~~TODO: Try out multithreading using Numba or Cython.~~ Tried Numba; needs too many changes. I can't use lambdas for instance.

danuker commented 3 years ago

Why not have 5000-individual-sized population? That would be easy to generate locally on each process, wouldn't it?

The reason I'm hesitant about that is if you only perform selection after evaluating 5000 individuals, you spend lots of CPU creating irrelevant individuals, instead of deriving from the ones proven to be fit. I must design a benchmark that captures this effect, and compare with PySR.

danuker commented 3 years ago

Consider using Twisted (and Deferreds):

Instead of forking every generation (taking lots of time to create new processes), create a pool of long-running processes only once at the start, and only move to them solutions to evaluate.

To evaluate performance, use 50 individuals per generation:

Solutions per second on one CPU without a server in between (as is currently)
Solutions per second on one CPU with a server in between (serialization, requests, and deserialization).
- Profile and minimize sending of resources
- Evaluate if it's worth continuing
Solutions per second on multiple CPUs with a server
- Estimate and maximize p and speedup in the context of Amdahl's law

Research:

rolisz commented 3 years ago

Another suggestion: use Ray. I haven't used it myself yet, but I keep hearing about it more and more often.

Otherwise, I believe the way to go is to start several processes in the beginning and send them work afterwards.

danuker commented 3 years ago

Thanks for visiting my humble project, and for the cool suggestion! I believe my next move will be to compare Dask and Ray.

If that doesn't work, I am thinking of better-architected multiprocessing: instead of starting processes each generation, I would start a pool of N long-running servers, and:

use a pre-allocated large-enough ShareableList of the individuals to evaluate
- I'd have to allocate a list of large initial individuals, to reserve enough memory, and watch out for ValueErrors:
- ValueError: exceeds available storage for existing str
use a pre-allocated large-ehough ShareableList for the output, where each child computes its own index.

danuker / symreg

Multiprocessing is slow #6