ModelOriented / forester

Trees are all you need
https://modeloriented.github.io/forester/
GNU General Public License v3.0
112 stars 15 forks source link

Control forester::train model using weights #121

Closed Leprechault closed 8 months ago

Leprechault commented 9 months ago

I'd like to know if you have any way to control train weights in the function of unbalanced sample size. In my case I have a dataset of 2 areas "a" and "b"(x_categorical_1), area 1 size 4 values (is small just for example), area 2 with 3 values. I don't like to make several bootstraps with size 3, but create some weights in the model considering the sample size by area. Is this possible in forester::train?

In my example:

library(forester)

# X data (predictors)
x_train <- data.frame(x_numeric_1=c(1,2,3,4,5,6,7),
  x_numeric_2=c(1,3,5,1,3,5,1),
  x_categorical_1=c("a","a","a","a","b","b","b"))

# Here a I'd like to given weights in function of   x_categorical_1 size, "a" = 4 values (7/4=0.6) and b = 3 values (7/3=0.4):

# w data (observation weights)
w_train <- c(0.6,0.6,0.6,0.6,0.4,0.4,0.4)

# y data (target)
y_train <- c(1,3,1,4,1,5,1)

# Fit the model
model.f <- train(data = x_train[1:2],
                           y = "y_train",
                           engine = c("ranger", "xgboost", "decision_tree", "lightgbm","catboost"),
                           type = "regression",
                           weights =w_train )
Error in train(data = x_train[1:2], y = "y_train", engine = c("ranger",  : 
  unused argument (weights = w_train)
#

Thanks in advance!!

HubertR21 commented 8 months ago

Hi, sorry for a long reply time.

Unfortunately, the forester package doesn't offer such feature. The tool however has other methods of dealing with imbalanced classes, such as train-test-validation splits which keep the original data distribution. Additionally, we enable the evaluation with metrics designed for imbalanced classes, such as balanced-accuracy.

Leprechault commented 8 months ago

Thanks for your answer @HubertR21. Best wishes!!