CodeReclaimers / neat-python

Python implementation of the NEAT neuroevolution algorithm
BSD 3-Clause "New" or "Revised" License
1.43k stars 496 forks source link

Question about: TODO: Add a curses-based reporter. #51

Open evolvingfridge opened 7 years ago

evolvingfridge commented 7 years ago

In reporting.py there is "TODO: Add a curses-based reporter". I would like do develop it, but have some questions in past I used urwid. Is ok to use urwid ? http://urwid.org/ Updated reporting can have parameter in config for reporting to use courses or print reporting. Additionally startup and some configuration can be done via urwid UI.

CodeReclaimers commented 7 years ago

Sure, any curses-like library that works should be fine. I may break out stuff like this into one or more separate Python packages that people can choose to install for convenience or specific needs, so that the core neat-python still only depends on the standard library.

evolvingfridge commented 7 years ago

Great, I was playing today with almost all libraries, so far my favorite cross platform lib is asciimatics, almost all others are not cross platform. Curses to work they need to be run in terminal, they will not work in IDE's terminal. Will share soon some prototype ideas.

evolvingfridge commented 7 years ago

@CodeReclaimers, I am still not sure what is best way to implement it, did not spend much time on curses, because was implementing mpi4py with neat-python.

CodeReclaimers commented 7 years ago

No worries, a curses reporter was just kind of a nice thing to have at some point. I'll be interested to see what you come up with for mpi4py--MPI was one of those things I never took time to learn (even though I probably should have).

evolvingfridge commented 7 years ago

I need to cleanup code, have no problem sharing code with you. In short first implementation is version of scattered data where across nodes/process data is scattered so that eval_genomes is applied on each process on subset of data and then output is gathered on master process. Working on scattering genomes on nodes/processes also. I have draft of documentation on how to setup for windows IDE and linux cluster/mpi/parallel-hdf5 and neat implementation with mpi. So far now sure when and how to upload all this on github :)

evolvingfridge commented 7 years ago

With reporting I have issue between relation of inputs and outputs, to definitively build a statistical model. Any suggestion would be welcome, I got to a crazy point in terms of subsampling data is meaningful and/or meaningless.

drallensmith commented 7 years ago

Could you describe the statistical problem in more detail? The input and output nodes (pins) should be fixed from the start, with the relation between them governed by the evolutionary pressure for a given output from a given input, so I don't think I'm understanding the difficulty you're having.

Regarding mpi4py, @bennr01 and I have been working on using pure-python means for distributed computing (multiprocessing module); while it currently has headaches on even a single machine, another concern is that we don't have a cluster for testing it on...

evolvingfridge commented 7 years ago

Adding neurons is currently done by random chance. Idea is to analyze inputs and outputs of each individual in population and based on analysis decided when to add neurons and number of additional neurons. So growth of neurons is dictated by not only current state and random chance of individual in population, but more on size (complexity) of input/output space. This is similar to Novelty Search, but I am still figuring out a degree of how similar my idea is to Novelty Search.

I did setup many times mpi4py based complete cluster on single machine and multiple machines (18 Nodes on gigabit network). I use mpi4py with ipyParallel, I can write docs how to setup virtual box's cluster with mpi4py and ipyparallel where code can be executed from windows or linux machines, also have some very crude but working code of neat with mpi4py on such cluster. Problem with implementation of pure python multi processing was in efficient intra-process communication, mpi simplicity was at time simple solution and arguably on of the most adaptable.

To avoid setup complexity we can just publish virtual box machines. I don't mind with some collaboration to write full cluster setup How To's and provide virtual box's also. For experimentation in theory virtual box overhead should not be a critical point.

drallensmith commented 7 years ago

Complexity of input space - somewhat similar to the ES-HyperNEAT question of how to distinguish the best places to add a new neuron (the original dissertation was using weight variance, but stated that something else might well work better). One interesting variant of Novelty Search was "Novelty of thinking" - monitoring the hidden nodes as well as the outputs to see how variable responses were to the inputs; it wound up generalizing better. That was with fixed-topology networks, but the authors remarked that it could be done using NEAT's homology-detection innovation numbers to see what to compare with what.

An extension onto NEAT-Python for mpi may do best, as IIRC @CodeReclaimers stated elsewhere, as a dependent project, so that NEAT-Python can remain using Python only; having a shared interface between the pure-Python and mpi4py variants of the distributed computing code would be very helpful for such, and also allow for other implementations. (Exactly how coupled processes need to be between machines - as in how efficient inter-process communication needs to be - is going to vary depending on the evaluation function, of course.) @bennr01 now has available an ARM-cluster, but it still needs setting up.

evolvingfridge commented 7 years ago

"Novelty of thinking" is a major issue I have on how to monitor network specifically in relation to input space. I am wrong saying that root of monitoring problem is in data clustering algorithms ?

Yes, I remember that it has to be pure python implementation :) Alright I think, my last code was without any modification to neat library, only required installation of mpi4py. The best way so far is to use both ipyparallel and mpi4py. The old pull requests that were meant as example, but wrongly uploaded (my mistake) are here: https://github.com/CodeReclaimers/neat-python/pull/72 Please let me know if you have any suggestion based on old pull requests and/or questions.

How I can help @bennr01 setup arm-cluster ? I can do all aspects of Beowulf cluster setup from network, firewall, ldap authentication to mpi4py/parallel hdf5 setup and configuration.

drallensmith commented 7 years ago

Hmm... there are various sorts of clustering questions in regard to novelty and similar. For input space, how does one cluster together similar situations, for a variety of purposes? The simplest clustering methods tend to work off of distances - either single distances or distances that can be combined together using euclidean or other metrics (manhattan, maximum...). Looking at multiple distance measures at the same time is somewhat analogous to multiobjective optimization. There are other means - if temporality is present, then what input space portions tend to lead to the same next-time-around (dependent on the actions taken...) portion of the input space, for instance (admittedly dependent itself on definitions of "same"...)? Other ways of looking at input space can be "what variables tend to vary together" - can one simplify it by taking into account correlations (Principal Components Analysis for relatively-linear cases, for instance)? Yet another way to look at it is if one has a set of learners - NEAT, decision/regression trees, or whatever - are there patterns in how they are giving the same result or different results; areas of disagreement tend to be ones that need to be broken up further. The area of unsupervised learning also has some potentially-useful algorithms for finding patterns.

Once one has some idea on input clustering, then the problem tends to become how to find distances between behaviors when inputs are similar (and then clustering the close ones for analysis purposes, finding the best members of a Hall of Fame, or whatever). Of course, here too one can have questions about multiple distances (not only in different behavioral parameters - how much does one count differences in a particular output, or a particular hidden node? - but if behavior is very similar for part of the input space but very different for another part, for instance).

The above are my thoughts off the top of my head, and thus rather vague in spots... but hopefully may inspire some useful ideas.

No modification needed? Interesting! I'll definitely take a look at that, and so should anyone else who (unlike me) does have a cluster available.

Regarding the arm-cluster, I don't know; you will need to inquire of @bennr01. I had originally been thinking in terms of that you (from past discussions) have at least one cluster available for testing, before @bennr01 mentioned now having a cluster; I have no idea of @benn01's experience with ARM-clusters.

drallensmith commented 7 years ago

Regarding the idea of clustering input space based on disagreements between learners - it reminded me of an interesting couple of articles on the intersection between machine learning and neuroscience - see doi:10.3398/fncom.2011.00050 and PMC5021692. The articles citing the latter may also be of interest, both on pubmedcentral and via Google Scholar, such as PMC5003864 - and in turn the first pubmedcentral article citing it, on representational models - and doi:10.3390/brainsci6040044. One example of an idea for input space clustering - train a pair of autoencoders/decoders - as in, take the input space and try to get it out the other end of a network that is restricted to having fewer connections between the halves than the number of variables in the input space, then take a look at those connections.

There are also some on ways of understanding neural networks, such as 10.1.1.713.1915 if (unlike me) you perceive things primarily visually.

evolvingfridge commented 7 years ago

Thank you very much, I doubt I would find them on my own in near feature with out you ! I am still reading docs you provided, this exactly what I was missing, will not say much beyond this point until I read them at least couple more times.

drallensmith commented 7 years ago

I'm very happy to help!

evolvingfridge commented 7 years ago

Any suggestions on idea to use instead of courses based reporter, pure python web server. Reason I think a web-server might be a better option is because windows/linux compatibility. It will work in fallowing way: reporter will dump data to a disk and web-server from separate independent process will read data and present to user as webpage, small java-script code will out refresh the html page.