ModelOriented / forester

Trees are all you need
https://modeloriented.github.io/forester/
GNU General Public License v3.0
108 stars 14 forks source link

Save just only a selected trainning model not all #124

Closed Leprechault closed 3 months ago

Leprechault commented 3 months ago

Please, I need help. After the better forester models adjustment, I save the model using:

cc_model <- train(data = data_train,
                           y = "cc",
                           engine = c("ranger", "xgboost", "decision_tree", "lightgbm","catboost"),
                           type = "regression")
saveRDS(cc_model,"C:/Users/fores/OneDrive/CombateSFCloud/dataset/model_creation/cc_ml_v16.rds")

But I just need to store just the decision_tree_RS_9 model and not all models in cc_ml_v16.rds RDS object.

Please, any help with it?

HubertR21 commented 3 months ago

Hi, if you want to save a particular object from the output, instead of the full train() outcome you have to do the following steps:

  1. Assign the selected model to the variable, by selecting it from the output,
  2. Use saveRDS() function, and provide the file name with .RData format,
  3. To load the file, use readRDS() function.

Alternatively you could ommit the first step.

model <- cc_model$models_list$decision_tree_RS_9
saveRDS(model, file = "model.RData")
x <- readRDS("model.RData")

# Alternatively
saveRDS(cc_model$models_list$decision_tree_RS_9, file = "model.RData")
Leprechault commented 3 months ago

Unfortunately, @HubertR21 doesn't work, I try and the output is:

model <- cc_model$models_list$decision_tree_RS_9
saveRDS(model, file = "model.RData")
x <- readRDS("model.RData")
test_predictions <- predict_new(x,data_test_f)
Error in if (type == "binary_clf") {

I checked the data_test_f object, and all the variables look likes OK and work with test_predictions <- predict_new(cc_model,data_test_f)

HubertR21 commented 3 months ago

Okay, I see where the issue is. If you want to just save the model and use it in a standalone way, fine-tune it, etc then the following method works. But if you also want the access to other forester features, like the aforementioned predict_new() then you need whole object.

The reason for that is the fact that in case of predicting new variables, we have to ensure that the incoming data will be processed in exactly the same way as the ones trained before.

Thanks for the issue, as now I see an opportunity for improvement where the user can can select a single model to predict!

Leprechault commented 3 months ago

Thank you, @HubertR21, for your support. I would like to request that in the future, you consider selecting a single model. When I try to open the complete train object, it uses up 45GB of RAM, which makes it difficult to use the forest model within a Shinny dashboard. This causes me to use an expensive cloud VM, which is not ideal.

HubertR21 commented 3 months ago

Hello again @Leprechault. Just a while age we've released a new package version, where I've added the requested feature. I hope, you will find it useful!

Leprechault commented 3 months ago

Thanks very much!!