Closed GeorgeXia1828 closed 7 years ago
I also met this problem. Do you have solved it?
Have you solved this problem?
@hzliang just index error, you should train with index start from 1 and test with python model with index from 0. You can find it described at the github sparkxgboost page.
the train error in train log is indeed right,
xgbModel.booster.saveModel("/local/path"), then you can use it in xgboost python api.
when using python module to do predict, the feature should transform from ... 1:xx_1,2:xx_2,3:xx_3 to 0:xx_1,1:xx_2,2:xx_3,3:0 maybe this will solve problem transform spark xgboost.
I am training spark xgboost model, the train-error in log is about 0.28, and I saved the model, then load model to test it on test set, get very bad auc and accuracy (auc = 0.65, acc=0.55), which I think should be acc is about 0.72, auc should be much higher than 0.72. also, I tried it on train set, the same result to test set. So I was confused, why accuracy is different from the log ?
1, My trainModel code
the log is: 2017-07-24 12:50:11,391-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:50:11,390 INFO [93] train-error:0.287650 2017-07-24 12:50:40,961-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:50:40,960 INFO [94] train-error:0.287634 2017-07-24 12:51:10,258-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:51:10,258 INFO [95] train-error:0.287631 2017-07-24 12:51:39,403-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:51:39,403 INFO [96] train-error:0.287623 2017-07-24 12:52:09,241-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:52:09,241 INFO [97] train-error:0.287612 2017-07-24 12:52:38,593-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:52:38,592 INFO [98] train-error:0.287607 2017-07-24 12:53:07,767-[TS] INFO Thread-50 ml.dmlc.xgboost4j.java.RabitTracker$TrackerProcessLogger - 2017-07-24 12:53:07,767 INFO [99] train-error:0.287586
2, my test model code
3, then I load the saved predict result use pyspark to calculate auc. using (from pyspark.mllib.evaluation import BinaryClassificationMetrics), I get the result auc(using areaUnderROC) is only 0.65, and I tried again on train set and the same!