DEAP / deap

Distributed Evolutionary Algorithms in Python
http://deap.readthedocs.org/
GNU Lesser General Public License v3.0
5.75k stars 1.12k forks source link

Remove SCOOP from DEAP #404

Open fmder opened 5 years ago

yxliang01 commented 5 years ago

I'm wondering why SCOOP is going to be removed?

fmder commented 5 years ago

Scoop is no longer in development, and to my knowledge, no longer supported.

soravux commented 5 years ago

Still works, though! :)

I think the parallelism ecosystem, especially in Python, has changed tremendously since SCOOP's inception. There are alternatives with more features and development resources than a one-man side-project. The only niche aspect that has not been replicated elsewhere is SCOOP's parsers for typical academic clusters management systems, such as SLURM and PBS-derivatives, but I think they can be reused easily.

HamiltonWang commented 5 years ago

scoop is great but it is no longer in development, please suggest another alternative which is as great.

soravux commented 5 years ago

In the past, SCOOP has often been compared to Pathos, which is still in relatively active development and is still a viable alternative ( https://github.com/uqfoundation/pathos ). I believe the futures interface of Dask (https://dask.org/) is one of the most interesting alternatives currently. It even has native support for HPC clusters ( https://docs.dask.org/en/latest/setup/hpc.html ). Other solutions include Celery (also broker-based, http://www.celeryproject.org/), Joblib (https://joblib.readthedocs.io/en/latest/), Pyina (https://github.com/uqfoundation/pyina), Dispy (https://github.com/pgiri/dispy) and Ray (https://github.com/ray-project/ray) are some of the examples to come to mind, most have much more developed memory sharing semantics, provide diagnostic tools and are using simpler or more suited communication alternatives than ZeroMQ under the hood. There are also much better serialization alternatives now than the one used by SCOOP (e.g. Dill).

As I said, SCOOP still works for many people, if you want to use it. However, I won't have the time anytime soon to update it, and I don't feel comfortable accepting most pull requests are they provide either features that are untested or for specific cluster systems that I have no access to, so I would not be able to maintain despite my best wishes.

HamiltonWang commented 5 years ago

is it possible that anyone can provide an example on Dask + DEAP? SCOOP doesn't really work for me smoothly right.

cyrilpic commented 5 years ago

There are more or less sophisticated ways to interact with Dask (or just its cluster library: distributed), but if you just want to get started with the default cluster (close to SCOOP), you can do:

import dask.bag as db

def dask_map(func, iterable):
  bag = db.from_sequence(iterable).map(func)
  return bag.compute()

#...
# In your deap setup
toolbox.register('map', dask_map)
HamiltonWang commented 5 years ago

@cyrilpic it works beautifully, thanks

Dadle commented 4 years ago

+1 for Celery support.

Any example implemntation using Docker orchestration would also be much appreciated!

DMTSource commented 4 years ago

I was reading about Ray(thank you @soravux for that suggestion above) and its solutions for shared memory objects to prevent the need to duplicate global scope by moving data to the node level(so cool) without having to reinvent any wheels. I wanted to give it a quick try, however if using a version 0.9 or higher things look even more familiar: If on newer version of Ray I should have used the ActorPool + Map as mentioned in ray docs to use a fixed size pool similar to mp/scoop to batch out the work vs firing the workers off at once.

Quick example of Ray 0.7.6 + DEAP to use shared-memory data and all processes on a machine for eval(had to move the GP' compile outside the remote functions due to lack of global scope in eval): https://gist.github.com/DMTSource/b80f1afb854f688dcccc4d60b18a721f

About the share mem objects and ray in general: https://ray.readthedocs.io/en/latest/walkthrough.html#objects-in-ray

fmder commented 4 years ago

Thanks for sharing it look really cool stuff!

Le 2 mars 2020 à 00:12, Derek Tishler notifications@github.com a écrit :

I was reading about Ray(thank you @soravux https://github.com/soravux for that suggestion above) and its solutions for shared memory objects to prevent the need to duplicate global scope by moving data to the node level(so cool) without having to reinvent any wheels. I wanted to give it a quick try, though I'm unsure about my use of ray and mapping the evals, it appears to work as intended.

Quick application of Ray + Deap to use shared-memory data and all processes on a machine for eval(had to move the GP' compile outside the remote functions due to lack of global scope in eval): https://gist.github.com/DMTSource/b80f1afb854f688dcccc4d60b18a721f https://gist.github.com/DMTSource/b80f1afb854f688dcccc4d60b18a721f About the share mem objects and ray in general: https://ray.readthedocs.io/en/latest/walkthrough.html#objects-in-ray https://ray.readthedocs.io/en/latest/walkthrough.html#objects-in-ray — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DEAP/deap/issues/404?email_source=notifications&email_token=AAHKXQXAKD6QTRV22Q4KZNTRFM53BA5CNFSM4ISTTBQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENN5XQA#issuecomment-593222592, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHKXQXQT25CYHBVJWOCAVTRFM53BANCNFSM4ISTTBQA.

bjuergens commented 4 years ago

this works for me:

from dask.distributed import Client
client = Client(processes=False)

def dask_map(*args, **kwargs):
    return client.gather(client.map(*args, **kwargs))

This example should behave exactly like scoop.futures.map:

I would love to do processes=True, but then I would have to initialize the classes in deap.creator somehow when setting up the worker. I assume this would be easy enough, but I don't know enough about dask to do it. (Also the above example is good enough for me for now)

fmder commented 4 years ago

Thanks I'll look into that for multiprocessing looks promising.

gfviegas commented 4 years ago

Just tested a couple of options listed above. Ray's one is elegant and powerful.. the plugin itself also has some nice-to-have tools to be integrated with deap such beside parallelization itself.

It has a live dashboard and logging which is fantastic!

+1 to Ray.

richardliaw commented 4 years ago

hey - I'm one of the Ray maintainers. Glad to see Ray mentioned here - I'm more than happy to help out for the Ray integration!

DMTSource commented 4 years ago

Today I was able to start an integration for Ray, and made some headway in trying to replace the way we use Scoop in Deap. It comes with a GA and GP example based on the existing Onemax and Symbreg examples.

Here is the repo, ideally to add to Deap when appropriate, for the new drop in use of Ray for registering a 'map' function in order to replace Scoop in as close a way as possible as what we are already used to. Some changes had to be made elsewhere vs the original examples to preserve scope for the remote workers.

Please check it out! Now DeltaPenalty and Ephemeral Constants work just fine at scale!!!

https://github.com/DMTSource/deap_ray

richardliaw commented 4 years ago

That's awesome @DMTSource ! I wonder if we can get in touch with the DEAP maintainers to see what they think..

NewJerseyStyle commented 3 years ago

Looking forwards to the integration with Ray

Was going to open an issue asking about it, so happy to see this already in progress ❤️

🎊 🎉