JuliaAI / DecisionTree.jl

Julia implementation of Decision Tree (CART) and Random Forest algorithms
Other
356 stars 102 forks source link

Feature Request: Class Weighting capabilities #229

Closed irslushy closed 3 weeks ago

irslushy commented 10 months ago

I'm working on creating a Random Forest classification model for a dataset that has an unequal class balance. Other packages such as scikit-learn provide a "class weight" functionality docs which allows the minority class(es) to be weighted more heavily in the training of the individual trees. As far as I can tell, there isn't any functionality like that in any Julia decision tree implementation. Would this be possible to add?

ablaom commented 10 months ago

Yes, this would be nice to have.

The sk-learn model does have class_weight and this is exposed in the MLJ wrapper. Unfortunately, passing a julia dict does not appear to work. Watch the linked issue for a possible workaround.

I haven't looked at the ScikitLearn.jl wrapper.

DecisionTree.jl has low maintenance from a few volunteers. If you'd like this feature added, your best chance is to make a PR, assuming you have the expertise. Be happy to review if someone else doesn't have the time.

It would be worth looking at the python code because it is based on C code which I think was ported to DecisionTree.jl, but I don't recall any accomodation for weights there. Or maybe the C code was just for individual trees. Sorry, I don't remember just now.

My suggestion would be to support per-observation weights first, and build class weight support on top of that (by using an analogue of the tool you linked to, which is something like this).

ablaom commented 3 weeks ago

Another workaround is to apply oversampling. MLJ now has a nice model wrapper, BalancedModel, supporting the popular schemes for doing this.