Open danuker opened 3 years ago
Why not have 5000-individual-sized population? That would be easy to generate locally on each process, wouldn't it?
The reason I'm hesitant about that is if you only perform selection after evaluating 5000 individuals, you spend lots of CPU creating irrelevant individuals, instead of deriving from the ones proven to be fit. I must design a benchmark that captures this effect, and compare with PySR.
Consider using Twisted (and Deferreds):
Instead of forking every generation (taking lots of time to create new processes), create a pool of long-running processes only once at the start, and only move to them solutions to evaluate.
To evaluate performance, use 50 individuals per generation:
p
and speedup in the context of Amdahl's lawResearch:
Another suggestion: use Ray. I haven't used it myself yet, but I keep hearing about it more and more often.
Otherwise, I believe the way to go is to start several processes in the beginning and send them work afterwards.
Thanks for visiting my humble project, and for the cool suggestion! I believe my next move will be to compare Dask and Ray.
If that doesn't work, I am thinking of better-architected multiprocessing: instead of starting processes each generation, I would start a pool of N long-running servers, and:
ValueError: exceeds available storage for existing str
Multiprocessing is very slow, compared to evolving one generation.
One 50-individual generation evolves in around 10ms, whereas starting a new process is 100-200 ms.
Still, in benchmarks, small generations with frequent feedback are what work best.
There is no obvious way to parallelize, because the GIL would block the CPU-intensive threads.
Think about what can be done.