Closed hengzhe-zhang closed 3 years ago
Typically the "issues" page for a GitHub repository is intended for discussion about the development of codebase. Your question doesn't related to PyshGP specifically, but rather the GP field as a whole. For future reference, I am sure you will get a more complete answer to your question in a different community. Perhaps the genetic programming tag on the data science stack exchange.
Lucky for you, the question you asked is something I am passionate about and have spent a long time researching and communicating so I will give you a brief answer. :)
As you mentioned, trees are difficult to modify for a few reasons: 1) they bloat and 2) that are difficult to keep type-safe.
The reason why most of the literature uses trees is because most of literature is focused on the symbolic regression/classification problem domain where there programs are arithmetic formulas that only contain one data types: numbers. Therefore, it is trivial to modify a tree and keep it type-safe. Stack-based and grammar-based methods were developed to evolve programs that use a "general" set of types, and thus can solve "general program synthesis" tasks.
There is a lot of active research in stack-based, grammar-based, and linear GP. In case you haven't seen it, this paper covers all 3 families of methods in a wonderful comparison that coder the relative trade-offs.
In the GP domain, there are numerous types of GP methods. However, I was unable to find a systematic review of the appropriate application scenarios for those methods. I discovered that the majority of the current literature prefers the tree-based method. However, it is obvious that the tree-based representation may be difficult to modify using the mutation or crossover operators. Stack-based, linear-based genetic programming methods or grammar evolution appear to be a competitive alternative to traditional tree-based methods. However, is there any literature that illustrates the advantages or disadvantages of these three similar methods? In my opinion, stack-based methods appear more promising in recent years because there is still a lot of work being done in this domain, whereas the other two methods receive little attention, but I am not sure if this is correct.