cavalab / brush

An interpretable machine learning library
http://cavalab.org/brush/
GNU General Public License v3.0
2 stars 0 forks source link

fit generates invalid sig_hash for ArrayXb terminal nodes #37

Closed gAldeia closed 1 year ago

gAldeia commented 1 year ago

Both Brush's C++ module and python wrapper works fine with datasets that does not contain any binary column.

However, during some experiments with the pmlb's adult dataset (which have 2 bool columns), my jupyter notebook python kernel eventually died, and it seems that fitting expressions with one or more binary columns were the cause.

It seems that the sig_hash created for the Terminal nodes that have its type inferred inside the data.h is different from what is stores in the dispatch_table. I tried to fix that, but with no success, so I decided to open an issue here (still trying to fix it though).

Below there are some evidence that I gathered while trying to fix that.

Python wrapper

I am using gdb, after changing setup.py to enable compiling the C++ module with debug mode, to get a backtrace of the core dump. I have converted the src/brush/D_TS_experiments.ipynb into an python script (.py) to be able to run it with gdb. For this specific backtrace, I used the docs/examples/datasets/d_example_patients.csv dataset, which contains one binary column (sex), due to simplicity.

When I run the Brush's NSGAII evolutionary algorithm I got:

FATAL ERROR brush/src/program/dispatch_table.h:172: sig_hash=577185359398356073 not in map_.at(Terminal)
options:
14884157073895229501
509529941281334733
13777882714371223207
17717457037689164349

terminate called without an active exception

Thread 1 "python" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.

The error backtrace is:

(gdb) backtrace
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352685376) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352685376, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fffb5e13026 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#6  0x00007fffb5e11514 in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#7  0x00007fffb5e11566 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#8  0x00007fffb5e117a4 in __cxxabiv1::__cxa_rethrow () at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:136
#9  0x00007fffb15882ed in Brush::Util::HandleErrorThrow (err=..., 
    file=0x7fffb169a830 "brush/src/program/dispatch_table.h", line=172)
    at brush/src/util/error.cpp:16
#10 0x00007fffb13c5da8 in Brush::DispatchTable<true>::Get<Eigen::Array<bool, -1, 1, 0, -1, 1> > (
    this=0x7fffb19e5fa0 <Brush::dtable_fit>, n=Brush::NodeType::Terminal, sig_hash=577185359398356073)
    at brush/src/program/dispatch_table.h:172
#11 0x00007fffb13748d7 in tree_node_<Brush::Node>::fit<Eigen::Array<bool, -1, 1, 0, -1, 1> > (this=0x555556546430, 
    d=...) at brush/src/program/tree_node.h:63
#12 0x00007fffb1374976 in Brush::Operator<(Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::fit(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffc2ef, d=..., tn=...)
    at brush/src/program/split.h:231
#13 0x00007fffb131dfa6 in Brush::Operator<(Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>--Type <RET> for more, q to quit, c to continue without paging--
, true, void>::eval(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc2ef, 
    d=..., tn=..., weights=0x0) at brush/src/program/split.h:278
#14 0x00007fffb12fc25f in Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (d=..., tn=...)
    at brush/src/program/operator.h:305
#15 0x00007fffb13aa8d9 in std::__invoke_impl<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __f=@0x7fffffffc420: 0x7fffb12fc22d <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:61
#16 0x00007fffb133be16 in std::__invoke_r<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __fn=@0x7fffffffc420: 0x7fffb12fc22d <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)68719476736, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<bool, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>) at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:116
#17 0x00007fffb130268d in std::_Function_handler<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Eigen::Array<float, -1, 1, 0, -1, 1> (*)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::_M_invoke(std::_Any_data const&, Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (__functor=..., __args#0=..., 
    __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:291
#18 0x00007fffb13aa4a8 in std::function<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::operator()(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffc420, 
    __args#0=..., __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:591
#19 0x00007fffb133b925 in tree_node_<Brush::Node>::fit<Eigen::Array<float, -1, 1, 0, -1, 1> > (this=0x555556546370, 
    d=...) at brush/src/program/tree_node.h:64
#20 0x00007fffb134431c in Brush::Operator<(Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::get_kids<std::array<Eigen::Array<float, -1, 1, 0, -1, 1>, 1ul> >(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc5cf, d=..., tn=..., 
    weights=0x0) at brush/src/program/operator.h:113
#21 0x00007fffb1307965 in Brush::Operator<(Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::eval<std::array<Eigen::Array<float, -1, 1, 0, -1, 1>, 1ul>, float>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc5cf, d=..., tn=..., 
    weights=0x0) at brush/src/program/operator.h:199
#22 0x00007fffb12f7bfc in Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (d=..., tn=...) at brush/src/program/operator.h:305
#23 0x00007fffb13aa8d9 in std::__invoke_impl<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __f=@0x7fffffffc700: 0x7fffb12f7bca <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)16, Br--Type <RET> for more, q to quit, c to continue without paging--
ush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:61
#24 0x00007fffb133be16 in std::__invoke_r<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __fn=@0x7fffffffc700: 0x7fffb12f7bca <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)16, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:116
#25 0x00007fffb130268d in std::_Function_handler<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Eigen::Array<float, -1, 1, 0, -1, 1> (*)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::_M_invoke(std::_Any_data const&, Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (__functor=..., __args#0=..., 
    __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:291
#26 0x00007fffb13aa4a8 in std::function<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::operator()(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffc700, 
    __args#0=..., __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:591
#27 0x00007fffb133b925 in tree_node_<Brush::Node>::fit<Eigen::Array<float, -1, 1, 0, -1, 1> > (this=0x5555565482b0, 
    d=...) at brush/src/program/tree_node.h:64
#28 0x00007fffb136380c in Brush::Operator<(Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::get_kids<Eigen::Array<float, -1, 4, 0, -1, 4> >(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc8ef, d=..., tn=..., weights=0x0)
    at brush/src/program/operator.h:115
#29 0x00007fffb1316c45 in Brush::Operator<(Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true, void>::eval<Eigen::Array<float, -1, 4, 0, -1, 4>, float>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&, float const**) const (this=0x7fffffffc8ef, d=..., tn=..., weights=0x0)
    at brush/src/program/operator.h:199
#30 0x00007fffb12fa9ca in Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (d=..., tn=...) at brush/src/program/operator.h:305
#31 0x00007fffb13aa8d9 in std::__invoke_impl<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
    __f=@0x7fffffffca20: 0x7fffb12fa998 <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:61
#32 0x00007fffb133be16 in std::__invoke_r<Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1> (*&)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Brush::Data::Dataset const&, tree_node_<Brush::Node>&> (
--Type <RET> for more, q to quit, c to continue without paging--
    __fn=@0x7fffffffca20: 0x7fffb12fa998 <Brush::DispatchOp<Eigen::Array<float, -1, 1, 0, -1, 1>, (Brush::NodeType)8388608, Brush::Signature<Eigen::Array<float, -1, 1, 0, -1, 1> (Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>, Eigen::Array<float, -1, 1, 0, -1, 1>)>, true>(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/invoke.h:116
#33 0x00007fffb130268d in std::_Function_handler<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&), Eigen::Array<float, -1, 1, 0, -1, 1> (*)(Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::_M_invoke(std::_Any_data const&, Brush::Data::Dataset const&, tree_node_<Brush::Node>&) (__functor=..., __args#0=..., 
    __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:291
#34 0x00007fffb1b364b8 in std::function<Eigen::Array<float, -1, 1, 0, -1, 1> (Brush::Data::Dataset const&, tree_node_<Brush::Node>&)>::operator()(Brush::Data::Dataset const&, tree_node_<Brush::Node>&) const (this=0x7fffffffca20, 
    __args#0=..., __args#1=...)
    at miniconda3/envs/brush/x86_64-conda-linux-gnu/include/c++/12.2.0/bits/std_function.h:591
#35 0x00007fffb1b269c9 in tree_node_<Brush::Node>::fit<Eigen::Array<float, -1, 1, 0, -1, 1> > (this=0x555556548370, 
    d=...) at brush/src/bindings/../program/tree_node.h:64
#36 0x00007fffb1b26a3c in Brush::Program<(Brush::ProgramType)0>::fit (this=0x5555575a33b0, d=...)
    at brush/src/bindings/../program/program.h:100
#37 0x00007fffb1b3659a in pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}::operator()(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&) const (__closure=0x555556700228, c=0x5555575a33b0, args#0=...)
    at miniconda3/envs/brush/include/pybind11/pybind11.h:110
#38 0x00007fffb1b6eaf5 in pybind11::detail::argument_loader<Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&>::call_impl<Brush::Program<(Brush::ProgramType)0>&, pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&, 0ul, 1ul, pybind11::detail::void_type>(pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && (
    this=0x7fffffffcb80, f=...) at miniconda3/envs/brush/include/pybind11/cast.h:1443
#39 0x00007fffb1b5dc51 in pybind11::detail::argument_loader<Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&>::call<Brush::Program<(Brush::ProgramType)0>&, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&>(pybind11::c--Type <RET> for more, q to quit, c to continue without paging--
pp_function::cpp_function<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&) && (this=0x7fffffffcb80, f=...) at miniconda3/envs/brush/include/pybind11/cast.h:1411
#40 0x00007fffb1b46dea in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}, Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&&, Brush::Program<(Brush::ProgramType)0>& (*)(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (
    __closure=0x0, call=...) at miniconda3/envs/brush/include/pybind11/pybind11.h:248
#41 0x00007fffb1b46efb in pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}, Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(pybind11::cpp_function::initialize<Brush::Program<(Brush::ProgramType)0>&, Brush::Program<(Brush::ProgramType)0>, Brush::Data::Dataset const&, pybind11::name, pybind11::is_method, pybind11::sibling, char [24]>(Brush::Program<(Brush::ProgramType)0>& (Brush::Program<(Brush::ProgramType)0>::*)(Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&)#1}&&, Brush::Program<(Brush::ProgramType)0>& (*)(Brush::Program<(Brush::ProgramType)0>*, Brush::Data::Dataset const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, char const (&) [24])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
    at miniconda3/envs/brush/include/pybind11/pybind11.h:223
#42 0x00007fffb1a93a02 in pybind11::cpp_function::dispatcher (self=0x7fffb1e63a80, args_in=0x7fffaff04dc0, 
    kwargs_in=0x0) at miniconda3/envs/brush/include/pybind11/pybind11.h:939
#43 0x0000555555755497 in cfunction_call (func=0x7fffb1e77920, args=<optimized out>, kwargs=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Objects/methodobject.c:542
#44 0x00005555557314d4 in _PyObject_MakeTpCall (tstate=0x555555ad58b8 <_PyRuntime+166328>, callable=0x7fffb1e77920, 
    args=<optimized out>, nargs=<optimized out>, keywords=0x0) at /usr/local/src/conda/python-3.11.2/Objects/call.c:214
#45 0x000055555573df85 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, 
    throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.2/Python/ceval.c:4772
#46 0x0000555555784e5d in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb02a0, 
    tstate=0x555555ad58b8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_ceval.h:73
--Type <RET> for more, q to quit, c to continue without paging--
#47 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, func=0x7fffb1eb11c0, tstate=0x555555ad58b8 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.2/Python/ceval.c:6435
#48 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=<optimized out>, func=0x7fffb1eb11c0) at /usr/local/src/conda/python-3.11.2/Objects/call.c:393
#49 _PyObject_VectorcallTstate (tstate=0x555555ad58b8 <_PyRuntime+166328>, callable=0x7fffb1eb11c0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_call.h:92
#50 0x0000555555784bf3 in method_vectorcall (method=method@entry=0x7fffaff06200, args=args@entry=0x7fffb1dce318, nargsf=<optimized out>, kwnames=0x7fffb1e972e0)
    at /usr/local/src/conda/python-3.11.2/Objects/classobject.c:59
#51 0x000055555576f57d in _PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7fffaff06200, func=0x555555784b10 <method_vectorcall>, tstate=0x555555ad58b8 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.2/Objects/call.c:257
#52 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7fffaff06200, tstate=0x555555ad58b8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.2/Objects/call.c:328
#53 PyObject_Call (callable=0x7fffaff06200, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.2/Objects/call.c:355
#54 0x0000555555841bd8 in partial_call (pto=0x7fffaff0cae0, args=0x7fffb1e97a60, kwargs=<optimized out>) at /usr/local/src/conda/python-3.11.2/Modules/_functoolsmodule.c:324
#55 0x00005555557314d4 in _PyObject_MakeTpCall (tstate=0x555555ad58b8 <_PyRuntime+166328>, callable=0x7fffaff0cae0, args=<optimized out>, nargs=<optimized out>, keywords=0x0)
    at /usr/local/src/conda/python-3.11.2/Objects/call.c:214
#56 0x00005555557aaf52 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffffffd5e0, callable=0x7fffaff0cae0, tstate=0x555555ad58b8 <_PyRuntime+166328>)
    at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_call.h:92
#57 map_next (lz=<optimized out>) at /usr/local/src/conda/python-3.11.2/Python/bltinmodule.c:1369
#58 0x00005555557caf12 in zip_next (lz=0x7fffb1e8eb40) at /usr/local/src/conda/python-3.11.2/Python/bltinmodule.c:2788
#59 0x000055555573dd6e in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at /usr/local/src/conda/python-3.11.2/Include/object.h:133
#60 0x00005555557fbb9e in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff7fb0020, tstate=0x555555ad58b8 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.2/Include/internal/pycore_ceval.h:73
#61 _PyEval_Vector (tstate=0x555555ad58b8 <_PyRuntime+166328>, func=0x7ffff6dcdf80, locals=0x7ffff6df2280, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Python/ceval.c:6435
#62 0x00005555557fb12f in PyEval_EvalCode (co=<optimized out>, globals=0x7ffff6df2280, locals=<optimized out>) at /usr/local/src/conda/python-3.11.2/Python/ceval.c:1154
#63 0x000055555581d49c in run_eval_code_obj (tstate=0x555555ad58b8 <_PyRuntime+166328>, co=0x555555c049d0, globals=0x7ffff6df2280, locals=0x7ffff6df2280)
    at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:1714
#64 0x0000555555819994 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff6df2280, locals=0x7ffff6df2280, flags=<optimized out>, arena=<optimized out>)
    at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:1735
#65 0x000055555582e912 in pyrun_file (fp=fp@entry=0x555555b3f520, filename=filename@entry=0x7ffff6d965b0, start=start@entry=257, globals=globals@entry=0x7ffff6df2280, 
    locals=locals@entry=0x7ffff6df2280, closeit=closeit@entry=1, flags=0x7fffffffda58) at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:1630
#66 0x000055555582e235 in _PyRun_SimpleFileObject (fp=0x555555b3f520, filename=0x7ffff6d965b0, closeit=1, flags=0x7fffffffda58) at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:440
#67 0x000055555582e003 in _PyRun_AnyFileObject (fp=0x555555b3f520, filename=0x7ffff6d965b0, closeit=1, flags=0x7fffffffda58) at /usr/local/src/conda/python-3.11.2/Python/pythonrun.c:79
#68 0x00005555558280d6 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7ffff6d965b0, program_name=0x7ffff6d2ec60) at /usr/local/src/conda/python-3.11.2/Modules/main.c:360
#69 pymain_run_file (config=0x555555abb900 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.2/Modules/main.c:379
#70 pymain_run_python (exitcode=0x7fffffffda50) at /usr/local/src/conda/python-3.11.2/Modules/main.c:601
#71 Py_RunMain () at /usr/local/src/conda/python-3.11.2/Modules/main.c:680
#72 0x00005555557e9819 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.2/Modules/main.c:734
#73 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x5555557e9770 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffdca8) at ../sysdeps/nptl/libc_start_call_main.h:58
#74 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x5555557e9770 <main>, argc=2, argv=0x7fffffffdca8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdc98) at ../csu/libc-start.c:392
#75 0x00005555557e96b1 in _start ()

Steps to get the backtrace:

  1. Set cfg = "debug" in setup.py
  2. from the root: pip install .
  3. enter gdb with gdb python
  4. Create the following script:

from brush import BrushRegressor import pandas as pd

if name == 'main':

data = pd.read_csv('docs/examples/datasets/d_example_patients.csv')
X = data.drop(columns='target')
y = data['target']

est = BrushRegressor().fit(X,y)
5. run the script with `(gdb) run <path_to_script_above.py>`
6. wait for the error, then call `backtrace`.

## Failing test case

To check if this happens also in the C++ module, I've implemented a simple test case that works with the following data:
```Dataset contains 3 samples and 5 features
x_0 <ArrayXb>: [false, true, false]
x_1 <ArrayXb>: [false, true, true]
x_2 <ArrayXi>: [2, 1, -3]
x_3 <ArrayXi>: [2, 1, 3]
x_4 <ArrayXf>: [2.1, 3.7, -5.2]

The test fails during the fit of the following expression:

Tree model for depth = 2, size= 4: Sub(1.00,If(x_0>1.00,x_4,1.00))
Name Sub, node Sub, feature , sig_hash 10001460114883919497
Name Constant, node Constant, feature C, sig_hash 17717457037689164349
Name SplitOn, node SplitOn, feature , sig_hash 13925856710854127623
Name Terminal, node Terminal, feature x_0, sig_hash 577185359398356073
Name Terminal, node Terminal, feature x_4, sig_hash 17717457037689164349
Name Constant, node Constant, feature C, sig_hash 17717457037689164349

PRG fit
FATAL ERROR brush/src/program/dispatch_table.h:172: sig_hash=577185359398356073 not in map_.at(Terminal)
options:
14884157073895229501
509529941281334733
13777882714371223207
17717457037689164349

terminate called without an active exception
Aborted (core dumped)
gAldeia commented 1 year ago

After talking with @lacava, we found that boolean values were missing in signature.h file: https://github.com/cavalab/brush/blob/546c2a792c9bdfc8358dbb84e6bb2f90a3f71ee6/src/program/signatures.h#L196-L201

After adding it, I had to make some changes so:

  1. Terminal nodes with boolean features cannot be weighted;
  2. get_weights knows about 1 and behave accordingly.

For 1, I created a new function to check if a node's ret_type is one of the unweightable types: https://github.com/cavalab/brush/blob/25c86740dd9643220d031d502ee33bf01121720b/src/program/node.h#L43-L62 and started using it in the Node's constructor.

I also changed how we manipulate the is_weighted: now we should use a getter and setter (as we already have for other attributes). The setter performs a check before enabling a node to be weightable: https://github.com/cavalab/brush/blob/25c86740dd9643220d031d502ee33bf01121720b/src/program/node.h#L248-L260

For 2, I've implemented a different function to get the weights of a boolean terminal (and changed the original one to require that the value is not boolean). The comment inside this function explains why it exists:

https://github.com/cavalab/brush/blob/25c86740dd9643220d031d502ee33bf01121720b/src/program/operator.h#L10-L58

These changes resulted in a new brush version that successfully passes on all tests, including the most recent test_data.cpp that I wrote specifically to fail due to the bug reported in this issue.

I think we can close this issue.

gAldeia commented 1 year ago

This was solved in the PR #43.