CodeReclaimers / neat-python

Python implementation of the NEAT neuroevolution algorithm
BSD 3-Clause "New" or "Revised" License
1.43k stars 495 forks source link

The distributed example doesn't work #251

Open nexon33 opened 2 years ago

nexon33 commented 2 years ago

The example code at https://github.com/CodeReclaimers/neat-python/blob/master/examples/xor/evolve-feedforward-distributed.py doesn't seem to work and I can't get it to work.

lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object '_ExtendedManager._get_manager_class.<locals>._EvaluatorSyncManager'

It would be really cool to run this on multiple devices and have it train a lot quicker

nexon33 commented 2 years ago

I did get it to work but indeed its a bit unreliable like the docs say.

Would je really excited to see that being picked up however. :)

bennr01 commented 2 years ago

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues. I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

nexon33 commented 2 years ago

Hi,

I am the guy who wrote neat.distributed a couple of years ago. The old version tried to utilize multiprocessing's distributed functionality, but (as you can see) it has some issues. I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.

Regarding performance: how much neat.distributed can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel or PyPy instead (note: pypy+neat.parallel is slower).

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I will try and take a look at the code tomorrow.

nexon33 commented 2 years ago

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

Is there any other way I can contact you?

bennr01 commented 2 years ago

In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?

I haven't tested it. In theory it should work as long as you set num_workers=1 on each secondary node and manually start a pypy process for each core on each secondary node. This is because IIRC pypy looses a lot of performance benefits when using multiprocessing.Pool, although this may depend on the exact use case and may have changed in the last couple of years. Running a seperate pypy process for each core may allow you to circumvent this.

I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't

For anyone else reading this: I've responded to a separate issue in my fork here.