ShuhuaGao / geppy

A framework for gene expression programming (an evolutionary algorithm) in Python
https://geppy.readthedocs.io/en/latest/
GNU Lesser General Public License v3.0
207 stars 76 forks source link
evolutionary-algorithm evolutionary-computation gene-expression-programming genetic-programming gep symbolic-regression system-identification

geppy: a gene expression programming framework in Python

geppy is a computational framework dedicated to Gene Expression Programming (GEP), which is proposed by C. Ferreira in 2001 [1]. geppy is developed in Python 3.

What is GEP?

Gene Expression Programming (GEP) is a popular and established evolutionary algorithm for automatic generation of computer programs and mathematical models. It has found wide applications in symbolic regression, classification, automatic model design, combinatorial optimization and real parameter optimization problems [2].

GEP can be seen as a variant of the traditional genetic programming (GP) and it uses simple linear chromosomes of fixed lengths to encode the genetic information. Though the chromosome (genes) is of fixed length, it can produce expression trees of various sizes thanks to its genotype-phenotype expressio system. Many experiments show that GEP is more efficient than GP, and the trees evolved by GEP tend to have a smaller size than the ones of GP.

geppy and DEAP

geppy is built on top of the excellent evolutionary computation framework DEAP for rapid prototyping and testing of ideas with GEP. DEAP provides fundamental support for GP, while lacking support for GEP. geppy tries the best to follow the style of DEAP and attempts to maintain compatibility with the major infrastructure of DEAP. In other words, to some degree geppy may be considered as a plugin of DEAP to specially support GEP. If you are familiar with DEAP, then it is easy to grasp geppy. Besides, comprehensive documentation is also available.

Features

Installation

From PyPI (recommended)

pip install geppy

From source

You can install it from sources.

  1. First download or clone this repository
    git clone https://github.com/ShuhuaGao/geppy
  2. Change into the root directory, i.e., the one containing the setup.py file and install geppy using pip
    cd geppy
    pip install .

    Documentation

    Check geppy documentation for GEP theory and tutorials as well as a comprehensive introduction of geppy's API and typical usages with comprehensive tutorials and examples.

Examples

A getting started example is presented in the Jupyter notebook Boolean model identification, which infers a Boolean function from given input-output data with GEP. More examples are listed in the following.

Simple symbolic regression

  1. Boolean model identification (Getting started with no constants involved)

  2. Simple mathematical expression inference (Constants finding with ephemeral random constants (ERC))

  3. Simple mathematical expression inference with the GEP-RNC algorithm (Demonstrating the GEP-RNC algorithm for numerical constant evolution)

    Advanced symbolic regression

  4. Improving symbolic regression with linear scaling (Use the linear scaling technique to evolve models with continuous real constants more efficiently)

  5. Use the GEP-RNC algorithm with linear scaling on the UCI Power Plant dataset See how to apply GEP based symbolic regression on a real machine learning dataset.

Requirements

Common pitfalls in using GP

Always keep in mind that evolution is random. Thus, any values may be input into a function. If issues like "overflow", "nan", or "not a number", or unreasonally huge values are encounterred, the most possible reason is that you did not protect a possibly dangerous function. For example, if the sqrt function lies in the function set, then in evaluating one individual evolved by geppy (or GP in general), it is likely that a negative input sqrt(-1.24) may happen.

Refer to issues #28 #26 #4 for more details.

Reference

The bible of GEP is definitely Ferreira, C.'s monograph: Ferreira, C. (2006). Gene expression programming: mathematical modeling by an artificial intelligence (Vol. 21). Springer.

You can also get a lot of papers/documents by Googling 'gene expression programming'.

[1] Ferreira, C. (2001). Gene Expression Programming: a New Adaptive Algorithm for Solving Problems. Complex Systems, 13. [2] Zhong, J., Feng, L., & Ong, Y. S. (2017). Gene expression programming: a survey. IEEE Computational Intelligence Magazine, 12(3), 54-72.

How to cite geppy

If you find geppy useful in your projects, please cite it such that more researchers/engineers will know it. A BibTeX entry for geppy is given below.

@misc{geppy_2020,
    author       = {Shuhua Gao},
    title        = {{geppy: a Python framework for gene expression programming }},
    month        = July,
    year         = 2020,
    doi          = {10.5281/zenodo.3946297},
    version      = {0.1},
    publisher    = {Zenodo},
    url          = {https://github.com/ShuhuaGao/geppy}
    }

Alternatively, if you want a more academic citation, you may cite our relevant paper

@ARTICLE{learn_async,
  author={S. {Gao} and C. {Sun} and C. {Xiang} and K. {Qin} and T. H. {Lee}},
  journal={IEEE Transactions on Cybernetics}, 
  title={Learning Asynchronous Boolean Networks From Single-Cell Data Using Multiobjective Cooperative Genetic Programming}, 
  year={2020},
  volume={},
  number={},
  pages={1-15},
  doi={10.1109/TCYB.2020.3022430}}