cavalab / brush

An interpretable machine learning library
http://cavalab.org/brush/
GNU General Public License v3.0
3 stars 0 forks source link

Variation max size and depth #30

Closed gAldeia closed 1 year ago

gAldeia commented 1 year ago

This pull request attempts to solve Issue #27.

Tests:

The test file test_variation.cpp implements tests to check if PARAMS["max_size"] or PARAMS["max_depth"] have any effect on the expressions after applying the variation methods. The problem of testing these constraints is that expressions are generated using PTC2 --- while this is an excellent method, it is important to notice that:

The tests, then, are divided into two groups. The former checks the variation sizes/depths considering additional nodes, and the later creates expressions with PTC2 by subtracting the possible additional size/depth.

I also create some tests in test_program to see if PTC2 works properly.

Mutation:

Function get_op_with_arg has a new parameter, with default behavior being exactly as it was before I added max_arg_count. This parameter is used to specify the maximum number of arguments that the node can have.

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/search_space.h#L376-L384

Mutation avoid exceeding the max size/depth by checking:

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/variation.h#L121-L128

The idea is that every mutation that can increase the size or depth should be inside this if block.

Although I could do a more fine check for the depth(since inserting a node not always increase the maximum depth of the tree), this would require the enumeration of some points were the mutation could be applied, creating an overhead.

Crossover:

Based on the child_point I need to check --- for every candidate in the other.Tree --- if:

First, I get the allowed size and depth:

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/variation.h#L182-L186

then I created an iterator and a lambda function to check if the candidate is valid:

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/variation.h#L199-L208

So this can be simply included in the previously implemented check (checks if the argument type is valid):

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/variation.h#L210-L221

Small improvements:

There are some changes that I would like to point here so we can discuss! I wrote them in the code with GUI TODO.

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/variation.h#L52

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/src/variation.h#L170

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/tests/cpp/test_program.cpp#L214-L222

https://github.com/cavalab/brush/blob/4ef1b9907333893a98c0a4facb64b2309eed0ba0/tests/cpp/test_program.cpp#L224-L227