Open nexon33 opened 2 years ago
I did get it to work but indeed its a bit unreliable like the docs say.
Would je really excited to see that being picked up however. :)
Hi,
I am the guy who wrote neat.distributed
a couple of years ago. The old version tried to utilize multiprocessing
's distributed functionality, but (as you can see) it has some issues.
I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.
Regarding performance: how much neat.distributed
can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either using neat.parallel
or PyPy instead (note: pypy+neat.parallel is slower).
Hi,
I am the guy who wrote
neat.distributed
a couple of years ago. The old version tried to utilizemultiprocessing
's distributed functionality, but (as you can see) it has some issues. I rewrote the entire module ~5 years ago (see #125). The changes weren't merged and by now seem to cause some merge conflicts, but it should solve most problems. In case you want to check it out: my branch https://github.com/bennr01/neat-python/tree/distributed_socket contains the entire changes, but it's also somewhat outdated compared to the main repository.Regarding performance: how much
neat.distributed
can improve the performance varies greatly depending on your exact network evaluation function. For most "small" examples (e.g. xor, ...), the performance cost of serializing and deserializing exceeds the cost of simply evaluating on a single device. If you are looking to improve evaluation speed for such examples, I'd recommend either usingneat.parallel
or PyPy instead (note: pypy+neat.parallel is slower).
In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?
I will try and take a look at the code tomorrow.
I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't
Is there any other way I can contact you?
In fact I'm trying to optimize a function that takes a few minutes to complete, so running this distributed would really speed this up a lot so I don't need to wait half an hour for one generation. Is Pypy in combination with distributed computing advised?
I haven't tested it. In theory it should work as long as you set num_workers=1
on each secondary node and manually start a pypy process for each core on each secondary node. This is because IIRC pypy looses a lot of performance benefits when using multiprocessing.Pool
, although this may depend on the exact use case and may have changed in the last couple of years. Running a seperate pypy process for each core may allow you to circumvent this.
I'm having trouble merging the repositories as I almost never have done it before. The problem is mainly how can I merge this so I can start selecting which code should stay and which shouldn't
For anyone else reading this: I've responded to a separate issue in my fork here.
The example code at https://github.com/CodeReclaimers/neat-python/blob/master/examples/xor/evolve-feedforward-distributed.py doesn't seem to work and I can't get it to work.
lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object '_ExtendedManager._get_manager_class.<locals>._EvaluatorSyncManager'
It would be really cool to run this on multiple devices and have it train a lot quicker