Closed meet1704 closed 8 months ago
Hey @meet1704, did you set the seed before fitting the two models? GBM, by default, samples the data in various ways before fitting each tree in the sequence. Also, without a reproducible example (or the code used to fit and deploy the model(s)), it is rather difficult to diagnose the issue.
Hey @bgreenwell ,
I have seeded the code before prediction. I was not able to replicate the issue with any toy code, so putting actual case. I have put, my model file - model.Rda and test_data in test_record.Rda in below link. https://github.com/meet1704/GBM_issue
Here is my code snippet -
load("test_record.Rda") if(nrow(Analytical_ML)>0) { Analytical_ML$model_name<-paste("GBM_Model2",AnalyticalML[,25],"",AnalyticalML[,26],"",Analytical_ML[,12],sep="") Analytical_ML$model_tree<-paste("gbmtree2",AnalyticalML[,25],"",AnalyticalML[,26],"",Analytical_ML[,12],sep="") country_combinations<-data.frame(unique(Analytical_ML$model_name)) Prediction = NULL temp <- NULL for(t in 1:nrow(country_combinations)) { set.seed(1234)
Analytical_test_CC<-Analytical_ML[Analytical_ML$model_name %in% country_combinations[t,],]
model<-paste("GBM_Model_2_",Analytical_test_CC[1,25],"_",Analytical_test_CC[1,26],"_",Analytical_test_CC[1,12],sep="")
tree<-paste("gbmtree_2_",Analytical_test_CC[1,25],"_",Analytical_test_CC[1,26],"_",Analytical_test_CC[1,12],sep="")
Prediction <- tryCatch({floor(predict.gbm(get(model),Analytical_test_CC,n.trees = get(tree)))},error=function(e){99999})
print(Prediction)
Prediction[Prediction < 0 & Prediction != ""] <- 0
Analytical_test_CC<-cbind(Analytical_test_CC,Prediction)
temp <- rbind(temp, Analytical_test_CC)
} }
temp$Prediction
The 3 records going to both the servers are exact same and the factor levels for the records are properly synced in both the servers. Moreover, we have around 100K predictions made, but the difference is seen in around 0.5% records only, without any pattern.
############ Results - server1 - Predictions - Server config - R3.4.1 - gbm 2.1.3 Prediction 729286 147 730285 147 731766 147
############### Results - server2 - Predictions - Server config - R3.4.1 - gbm 2.1.3 Prediction 729286 144 730285 144 731766 144
Thanks in advance !!!!
Hello Team,
We are running GBM model in 2 different servers, with exact same R Version and GBM version. We are trying to predict on exact same data, but GBM is producing different predictions for some of the records. We are not using external label-encoding before doing prediction, but depending on internal label-encoding of the package.
This is Distribution of Predicted variable in server 1, which is to be read as follows, 1821 records predicted in 0 bucket, 14236 in 1 bucket and similarly. table(temp$Prediction)
1821 14236 21582 12316 18035 6724 12986 4713 4908 8167 893 672 84 216 554 557 1285 205 27
19 20 21 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 69 45 207 54 7 176 30 653 22 6 97 123 1 76 3096 54 143 27 38
39 41 42 43 47 48 99999 18 56 267 242 182 31 4244
Prediction distribution on server 2 table(temp$Prediction) 0 1 2 3 4 5 6 7 8 9 10 11 12 1826 14208 21743 12175 18037 6744 13913 3742 4890 8178 923 676 82
13 14 15 16 17 18 19 20 21 23 24 25 26 216 583 529 1284 205 27 69 67 185 54 7 176 30
27 28 29 30 31 32 33 34 35 36 37 38 39 653 22 6 97 16 108 76 3096 54 143 27 38 18
41 42 43 47 48 99999 56 267 242 182 31 4244