dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.19k stars 8.71k forks source link

Problems about predicting by pandas #9928

Closed 903019003 closed 8 months ago

903019003 commented 9 months ago

I trained a xgboost model by libsvm format,when predicting values with same data (pandas dataframe),the model predicts wrong answers.

903019003 commented 9 months ago

tree structure like this : booster[0]: 0:[4003<1] yes=1,no=2,missing=2 1:[21<0.00287356321] yes=3,no=4,missing=4 3:[1563<0.100484997] yes=7,no=8,missing=8 7:[10023<0.380952388] yes=15,no=16,missing=16 15:leaf=0.421930671 16:leaf=0.437725931 8:[10<2.08999991] yes=17,no=18,missing=18 17:leaf=0.388969213 18:leaf=0.423030943 4:[10016<0.295454532] yes=9,no=10,missing=10 9:[1565<0.136473] yes=19,no=20,missing=20 19:leaf=0.366048157 20:leaf=0.308834165 10:[5016<0.429840147] yes=21,no=22,missing=22 21:leaf=0.388634413 22:leaf=0.426836252 train_data format: 2 10:0.98350227 11:-8.8888888E7 21:-8.8888888E7 314:0.0 317:0.41715977 409:-8.8888888E7 predict data format:

Snipaste_2023-12-27_15-39-13

mosaikme commented 9 months ago

hey which, versions are you using? pandas and xgboost?

trivialfis commented 8 months ago

closing as stalled. Feel free to reopen if there is further information.