Closed UniqueFool closed 8 years ago
1) if/how can this be used to directly deal with ASTs (parse trees) - i.e. beyond GAs and so that this can be used to seed/create, mutate syntax trees (e.g. from python ast)
Can you elaborate on this? I'm not familiar with ASTs. There are function within TPOT that generate and mutate the GP trees that represent machine learning pipelines, but those are currently hidden from the user.
2) if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?
My boss is very much pushing for GPU support, so we may go that way eventually, but currently we're still focusing on fully developing the TPOT functionality (e.g., adding more pipeline operators #45 / #46) and expanding support to other ML problems (e.g., regression #30). We'll be looking at optimizations such as GPU support after we've reached a fairly stable state for TPOT.
Currently, we set n_jobs=-1
everywhere possible in the sklearn code to support multithreading. Random forests, for example, will make use of all available cores when fitting and predicting. It looks like it may be possible to support multithreading in DEAP (the GA library) as well.
regarding Abstract Syntax Trees (ASTs), frameworks like deap can be used to support configurable input/output tree formats, which is to say that a DSL (domain specific language) can be internally used by the GP framework for representation and crossover/mutation purposes.
The powerful thing here is to support a programming language, like Python, as both, its output, but als input - i.e .similar to LISP. In other words, you could throw python code at the GP framework, in terms of functions and building blocks (e.g. a subset of python) and let tpot mutate that, and then dump the resulting python code to a file.
For simplicity, let's imagine a python "hello world" script that is fed to tpot for functions/terminals, with its fitness function requiring it to output "Hello tpot", and provide the script as python-code.
Thanks for your clarifying comments regardign GPU support, you will probably want to take a look at pyopencl sooner or later.
I see what you mean now. I've heard of researchers using GP as a way to create computer programs that take given inputs and produce a specific output. With TPOT, the idea is to constrain the available grammar to only machine learning operators, with the hope that such constraints will aid in faster discovery of effective pipelines. Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks.
My boss, Jason Moore, has actually developed a system similar to what you propose. He has dozens of papers out on it now; here's one of them. His GP system evolves both the rules and the features that are used to make the classification. The big difference with his work is that he's not evolving Python code; rather, he's evolving mathematical expressions.
Thanks for your response !
Opening the entire Python language to GP entails a much larger search space, so the idea here is to take pre-built bits of code (primarily from sklearn) and use them as the building blocks.
Actually, a configurable subset will do - in fact, you will see that this is how "deap" works: you can register primitives/terminals and use Python callbacks for those, while specifying their signature/arity, including even strong typing: http://deap.gel.ulaval.ca/doc/default/examples/gp_symbreg.html
The example you can see there is using just a handful of Python callbacks for the tree representation, which makes it possible to use Python as both, the input but also the output for the trees that are manipulated by GP to evolve algorithms.
The paper you mentioned looks interesting, note that this way of using Python to literally support "recursion" is very powerful, as it allows genetic metaprogramming, i.e. a genetic program used to modify a GP to evolve algorithms: https://mitpress.mit.edu/sites/default/files/titles/alife/0262297140chap52.pdf
regarding your comments on gpu support, deap is using scoop
http://www.randalolson.com/2015/11/15/introducing-tpot-the-data-science-assistant/
Given that this is very much work in progress, I am primarily wondering:
1) if/how can this be used to directly deal with ASTs (parse trees) - i.e. beyond GAs and so that this can be used to seed/create, mutate syntax trees (e.g. from python ast) and 2) if there are any plans to support OpenCL, e.g. for running things concurrently using GPUs or idle CPU cores ?
Thanks
(note that numpy based code can often be easily moved to OpenCL using pyOpenCL)