Closed irslushy closed 3 weeks ago
Yes, this would be nice to have.
The sk-learn model does have class_weight
and this is exposed in the MLJ wrapper. Unfortunately, passing a julia dict does not appear to work. Watch the linked issue for a possible workaround.
I haven't looked at the ScikitLearn.jl wrapper.
DecisionTree.jl has low maintenance from a few volunteers. If you'd like this feature added, your best chance is to make a PR, assuming you have the expertise. Be happy to review if someone else doesn't have the time.
It would be worth looking at the python code because it is based on C code which I think was ported to DecisionTree.jl, but I don't recall any accomodation for weights there. Or maybe the C code was just for individual trees. Sorry, I don't remember just now.
My suggestion would be to support per-observation weights first, and build class weight support on top of that (by using an analogue of the tool you linked to, which is something like this).
Another workaround is to apply oversampling. MLJ now has a nice model wrapper, BalancedModel
, supporting the popular schemes for doing this.
I'm working on creating a Random Forest classification model for a dataset that has an unequal class balance. Other packages such as scikit-learn provide a "class weight" functionality docs which allows the minority class(es) to be weighted more heavily in the training of the individual trees. As far as I can tell, there isn't any functionality like that in any Julia decision tree implementation. Would this be possible to add?