cavalab / brush

An interpretable machine learning library
http://cavalab.org/brush/
GNU General Public License v3.0
2 stars 0 forks source link

Bug fix - Segmentation fault caused by insert mutation #35

Closed gAldeia closed 1 year ago

gAldeia commented 1 year ago

While I was learning how to adjust the weights of the mutations, my Python kernel was dying due to a segmentation fault. This pull request fixes that and also has some minor fixes and comments as well.


The segmentation fault was occurring in get_op_with_arg. The function takes the argument and return type to match, and returns a compatible operator to be inserted in the tree.

https://github.com/cavalab/brush/blob/546c2a792c9bdfc8358dbb84e6bb2f90a3f71ee6/src/search_space.h#L382-L430

However, in the previous implementation (code above), the max_size restriction that I added would ignore all operators in rare cases, making the select_random iterate over an empty collection (this was hard to find :sweat_smile: ). For example, if max_arg_count=1, and the compatible nodes have node.get_arg_count()>1, then no node would be selected, and select_randomly would create a segmentation fault.

To fix that, I made the insert mutation behave like PTC2 by relaxing the size limit control of the programs. Now the get_op_with_arg will avoid creating an empty collection, and the insert_mutation can create a program that exceeds the maximum size by the highest arity among the operators.

Other improvements:

gAldeia commented 1 year ago

Thanks for the suggestions, I am going to make a commit with these improvements!

I think we might want to move to making these SearchSpace functions return std::optional and handling failures more gracefully, because they may be unavoidable depending on the terminal and function set.

This is a good idea. I can try to implement this. Do you have any suggestions about how to handle cases when optional is null?

My suggestion would be to try to select the spot in the mutation 3 times and if it doesn't succeed in any, it returns the expression itself (as it is in the crossover). As for the crossover, I would just need to adapt the code to use the optional.

I don't know if this could increase the execution time, but --- if so ---, we could do a single attempt at crossover and mutation.

I have run into another situation where the search space get_X functions fail, which is when there isn't another node of the same type.

After start using the std::optional, I will work on fixing this as well.

gAldeia commented 1 year ago

While this branch fix one bug, I think we should not merge it right now, since I am actively modifying this branch to find more bugs.

Another reasons are:

  1. @lacava mentioned that we could start using std::optionalto handle failures more gracefully, and I agree. I started working on this improvement on another branch. I think we should not merge this branch until we start using std::optional effectively.
  2. As soon as I get more confident about the Multi-Armed Bandit learner implementations, they can be easily implemented in this new version of variations using the std::optional, so we can also wait for this integration before pushing into master.
  3. I also have spotted one problem while working on this branch that we should fix (issue #37).