Closed gAldeia closed 1 year ago
After talking with @lacava, we found that boolean values were missing in signature.h file: https://github.com/cavalab/brush/blob/546c2a792c9bdfc8358dbb84e6bb2f90a3f71ee6/src/program/signatures.h#L196-L201
After adding it, I had to make some changes so:
For 1, I created a new function to check if a node's ret_type
is one of the unweightable types:
https://github.com/cavalab/brush/blob/25c86740dd9643220d031d502ee33bf01121720b/src/program/node.h#L43-L62
and started using it in the Node's constructor.
I also changed how we manipulate the is_weighted
: now we should use a getter and setter (as we already have for other attributes). The setter performs a check before enabling a node to be weightable:
https://github.com/cavalab/brush/blob/25c86740dd9643220d031d502ee33bf01121720b/src/program/node.h#L248-L260
For 2, I've implemented a different function to get the weights of a boolean terminal (and changed the original one to require that the value is not boolean). The comment inside this function explains why it exists:
These changes resulted in a new brush version that successfully passes on all tests, including the most recent test_data.cpp
that I wrote specifically to fail due to the bug reported in this issue.
I think we can close this issue.
This was solved in the PR #43.
Both Brush's C++ module and python wrapper works fine with datasets that does not contain any binary column.
However, during some experiments with the pmlb's adult dataset (which have 2 bool columns), my jupyter notebook python kernel eventually died, and it seems that fitting expressions with one or more binary columns were the cause.
It seems that the
sig_hash
created for the Terminal nodes that have its type inferred inside thedata.h
is different from what is stores in thedispatch_table
. I tried to fix that, but with no success, so I decided to open an issue here (still trying to fix it though).Below there are some evidence that I gathered while trying to fix that.
Python wrapper
I am using gdb, after changing
setup.py
to enable compiling the C++ module with debug mode, to get a backtrace of the core dump. I have converted thesrc/brush/D_TS_experiments.ipynb
into an python script (.py) to be able to run it with gdb. For this specific backtrace, I used thedocs/examples/datasets/d_example_patients.csv
dataset, which contains one binary column (sex), due to simplicity.When I run the Brush's NSGAII evolutionary algorithm I got:
The error backtrace is:
Steps to get the backtrace:
cfg = "debug"
in setup.pypip install .
gdb python
from brush import BrushRegressor import pandas as pd
if name == 'main':
The test fails during the fit of the following expression: