erp12 / pyshgp

Push Genetic Programming in Python.
http://erp12.github.io/pyshgp
MIT License
74 stars 23 forks source link

Programs in output don't look like Push programs #138

Closed lspector closed 4 years ago

lspector commented 4 years ago

Programs in output look something like this:

Program(signature=ProgramSignature(output_stacks=['str'], arity=1, push_config=PushConfig(numeric_magnitude_limit=1000000000000.0, step_limit=500, collection_size_cap=1000, growth_cap=500, runtime_limit=10)), code=CodeBlock([Input(input_index=0), InstructionMeta(name='str_but_last', code_blocks=0), InstructionMeta(name='str_but_last', code_blocks=0), Input(input_index=0), InstructionMeta(name='str_but_last', code_blocks=0), InstructionMeta(name='str_but_last', code_blocks=0), InstructionMeta(name='str_concat', code_blocks=0)]))

I am guessing that the actual Push program in this case looks something like:

(in0 str_but_last str_but_last in0 str_but_last str_but_last str_concat)

Is that right? Can output of the actual Push program also be included?

erp12 commented 4 years ago

We can definitely add this, but I would like to keep the default string representation as it is.

One thing that might be confusing about this output is that the definition of a Program is a little different that usual. In pyshgp, a program holds an interpreter config and other metatadata that will be used to run the program so that the behavior will be consistent even after saving and loading the program or creating a new interpreter. If you want to focus on the code of the program, you can do my_program.code but even that will not be as neat as what you are looking for.

Currently the output you are seeing is inherited from the pyrsistent base classes, which is the Clojure-inspired library for persistent data structures and classes. One of the benefits is that the output is valid python code (assuming you have the correct modules imported). You should be able to copy-paste that program into a scratch file, create an interpreter, and start passing inputs and running the program.

I just opened #139 that adds a pretty_str() method to programs and code blocks. This will produce a string with approximately the output you are looking for. The next release should pick up this change.

As a warning, although this program representation may look similar to Clojush, pyshgp programs won't be executable in Clojush. Was that your goal?

erp12 commented 4 years ago

Closing this issue, but feel free to reopen if #139 doesn't solve the issue.

lspector commented 4 years ago

Great that the functionality now exists, but if I'm not mistaken this representation is not yet provided to the user by default when running the system. I think that it should be.

That is, when a new user does this:

  1. Downloads or clones the repository
  2. Runs pip install . --upgrade (is this always necessary?)
  3. Runs something like python examples/string_demo.py

then they should see in the output, in generational reports and in the display of the final solution (if a solution is found), results that look like Push programs. Printing internal, PyshGP-specific data structures is fine too, but if there's nothing that actually looks like a Push program then that will be confusing. It will still be hard, especially for new-comers, to understand how the evolved Push programs work, but if they're not even shown Push programs then it will be much harder.

Re: the question above about whether the goal is that they'll be executable in Clojush: no. Or rather, probably yes for simple programs that involve only instructions that are available in both systems (maybe with some renamings), and don't depend on edge cases involving size limits and the like. But the goal here isn't interoperability. The goal is for people who have some understanding of how Push works to be able to look at the output, see a Push program, and at least for simple cases to be able to understand something about the program structure and contents, and get an idea of what it does.

erp12 commented 4 years ago

Thanks, that clarifies things. I agree that we should make the pretty_str representation more front and center. Below are responses to individual questions and comments from your response.

  1. Downloads or clones the repository

This isn't strictly required for most uses, and I wouldn't recommend it to new-comers. The most recent release version is always published to pypi (clojars for python) and can be installed with pip install pyshgp.

PyshGP is only a library. It provides base classes (Selector, Variation Operator, SearchAlgorithm) and some concrete sub-classes (ie. LexicaseSlection, Alternation, GeneticAlgorithm). It is expected that most users would have their own project that lists pyshgp as a dependency and implements their own sub-classes to customize evolution.

It is also possible to register new PushType classes to work with new data types and new instructions can be registered. Look at the point distance example to see how this is done without modifying the source. There are also guides in the documeation.

Using a fork/clone/download of the repository is only required when one wants to change to the core of pyshgp. I welcome these changes from anyone, but it isn't strictly required to get up and running and adds quite a bit to the learning curve. Python libraries require a particular project structure that will distract from the experience of learning about PushGP.

  1. Runs pip install . --upgrade (is this always necessary?)

If you are modifying the source of pyshgp (aka working off a clone) then yes. This is one reason why that is a more difficult workflow that I would not recommend to new-comers.

If you pip install pyshgp to get the latest published release, you can copy-paste the examples into your own script and run them without the rest of the project. Be sure to use the correct version of the example files for the current release! See #135.

You will only need to run pip install pyshgp --upgrade when there is a new release version published that you want to use.

then they should see in the output, in generational reports and in the display of the final solution (if a solution is found), results that look like Push programs.

I can add this. Currently there is no printing of programs that happens generationally. I already switched the solution printing in the examples to be pretty_str. What kind of information would you like to see printed every generation by default?

Also, as with the stuff I mentioned above, this behavior is customizable without changing the source code for pyshgp using the new TapManager functionality. In your own project (or problem file scripts) you can add something like this:

class LSpectorGenerationTap(Tap):

    def pre(self, id: str, args, kwargs, obj=None):
        search = args[0]
        print("Best program:", search.population[0].program.pretty_str())

TapManager.register("pyshgp.gp.search.SearchAlgorithm.step", LSpectorGenerationTap())

Which will change the behavior of the side-effect that gets called at the start of every search step to be printing the pretty_str of the program with the lowest total error. This feature is under-documented (#137).

lspector commented 4 years ago

Super helpful comments.

If a new user wants to run PushGP (not just tests), then they really do need to download an example or write their own, right?

The new users I'm thinking of won't already have or know how to write their own project. I'd like them to be able to see PushGP in action before being so committed that they'll figure out how to do that. If downloading/cloning the repository isn't the best way for them to do that, what would be? The reason I went down this path is because the only mention of running examples in the docs is in the section on downloading from source (BTW it says "Frome" there, both here and here).

Now that you've pointed out (again :-)) that pretty_str is now being called for solutions, and I've looked more carefully, I see that the Push program is indeed being printed. This is exactly what I wanted -- thanks! I guess it would be a little better if there was a header or something (like Solution Push program:), and headers also for the other things that are printed, one of which appears to be an error vector but I don't know what the other one is):

Solution found.
(input_0 str_but_last str_but_last input_0 str_concat str_but_last str_but_last)
[['abcabc'], [''], [''], [''], ['TT'], ['leprechaleprecha'], ['zoomzoomzozoomzoomzo'], ['qwertyuiopaqwertyuiopa'], ['GallopTrotCantGallopTrotCant'], ['QuinoQuino'], ['_a_a']]
[0 0 0 0 0 0 0 0 0 0 0]

It would indeed also be great if there could be generational reports. What would be involved in adding those?

erp12 commented 4 years ago

Before I respond to some of your questions, quick fyi.

I just published the v0.1.7 release containing the .pretty_str() method, some section headings and cleanup of printed output, and a bunch of other little things. Release notes can be found here: http://erp12.github.io/pyshgp/html/release_notes.html#v0-1-7-june-15-2020


If a new user wants to run PushGP (not just tests)

This is another great example of why starting out in a clone of the repo adds extra work. A user shouldn't have to worry about tests. Tests are for the developers of the core pyshgp project to catch bugs and make sure there aren't any breaking chagnes.

Users can trust that pyshgp will work correctly as long as it is called correctly from within their code. If they work off of a clone, they will have to make sure any changes they make don't break tests, which is added burden if the user is only trying to experiment with using pyshgp as opposed to building pyshgp.

If a new user wants to run PushGP (not just tests), then they really do need to download an example or write their own, right?

The new users I'm thinking of won't already have or know how to write their own project.

I think my word choice was poor. In Python, a "project" can be as simple as a single script. If a user wants to start working with pyshgp by running an example, they can copy any of the files in examples/ and run it without needing any other files from the pyshgp repository.

This is probably easier explained by "showing" rather than "telling" so I made a new repository as a demo.

https://github.com/erp12/pyshgp-demo

As the README in that repo suggests, if you have a recent enough version of Python and run pip install pyshgp (or pip install pyshgp --upgrade if you have an older version installed) you can run python demo.py to attempt to evolve a simplified FizzBuzz program.

In that demo script I document a few key things:

Hopefully this makes it more clear why using pyshgp as a library doesn't restrict what can be done and is much faster to get started. As a user of the library, you only have to understand the public, documented, API of pyshgp. In contrast, when working off a clone of the pyshgp repo, you also have to understand the file organization, testing, documentation generation, release deployment, style checking, etc.

That said, some things are not parametrizable in pyshgp. For example, if someone wants to experiment with a new genome representation that would require changes to pyshgp's internals. They would need the pyshgp source code to do that.

Things that are parametrizable by using pyshgp as a library:

Things that are not parametrizable, and would need internal hacking to experiment with:

I've done my best with the design of pyshgp to make as many things parametrizable as possible, but its very hard to make something fully open ended. I chose to take an opinionated approach with certain components of the system for sanity's sake.

After users learn how to use pyshgp they will eventually wish for something to be different. The addition of a missing feature. A replacement for a leaky abstraction. Better documentation. This is when they fork/clone pyshgp, get familiar with the the code and tests, and open a Pull Request for their changes. Now they have become a contributor. They work with the pyshgp source code (all of it, including tests and docs) and have an understand of how pyshgp is used and how it is built.

lspector commented 4 years ago

Really great!

Can a pointer to pyshgp-demo be added prominently to the readme for pyshgp, to direct new users there to see how to run it? Perhaps much of the comment above can also be incorporated into the pyshgp-demo and/or pyshgp readmes?

erp12 commented 4 years ago

Done in #144