dmitryikh / leaves

pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks
MIT License
427 stars 72 forks source link

error for load xgboost:gbtree #53

Open randbear opened 5 years ago

randbear commented 5 years ago

I got an error when I tried to load binary model of xgboost:gbtree, the error message as follow:

panic: unexpected EOF

goroutine 1 [running]: main.main() /Users/zhangxiatian/tuotuo/workspace/go/predictor/main.go:13 +0x1ba

Process finished with exit code 2

====================== the code as follow:

package main

import ( "fmt"

"github.com/dmitryikh/leaves" )

func main() { // 1. Read model model, err := leaves.XGEnsembleFromFile("/Users/zhangxiatian/tuotuo/recsys/engin/model/model") if err != nil { panic(err) }

}

dmitryikh commented 5 years ago

Hi! Thanks for your feedback. Could be please provide more information:

  1. How did you build model file? Did you use pythons bindings for xgboost or something? Can you provide the code?
  2. What versions of xgboost did you use?
dmitryikh commented 5 years ago

@randbear , Did you manage with the problem by your self? If not, could you answer to my questions above?

Thanks!

xexiyong commented 5 years ago

I meet the same question; the strange is the size of model file almost increase once... while with vim shell open it, it seems no change.

anuragkyal commented 4 years ago

@dmitryikh Saw the same error today. xgboost version 1.1.1 training code:

    train_labels = train['click']
    train_features = train.drop('click', axis=1)

    test_labels = test['click']
    test_features = test.drop('click', axis=1)

    # convert to data matrix
    train_matrix = xgb.DMatrix(train_features, train_labels)
    test_matrix = xgb.DMatrix(test_features, test_labels)

    # group by query
    train_matrix.set_group(train_group)
    test_matrix.set_group(test_group)

    watchlist = [(train_matrix, 'train'), (test_matrix, 'eval')]

    param = {
        'objective': 'rank:ndcg',
        'max_depth': 10,
        'eta': 0.1,
        'eval_metric': ['ndcg'],
        'colsample_bytree': 0.8,
        'subsample': 0.8,
        'tree_method': 'hist',
        'nthread': 64,
        'verbosity': 1
    }
    bst = xgb.train(param, train_matrix, 50, watchlist, early_stopping_rounds=10)
    temp_file = "/tmp/xgboost.model"
    bst.save_model(temp_file)

Any ideas why?

guiyang882 commented 4 years ago

I have the same question.

//
//  main.cpp
//  xgboost-example
//
//  Created by WuMing on 2020/8/26.
//  Copyright © 2020 WuMing. All rights reserved.
//

#include <stdio.h>
#include <stdlib.h>
#include <xgboost/c_api.h>

#define safe_xgboost(call) {                                            \
int err = (call);                                                       \
if (err != 0) {                                                         \
    fprintf(stderr, "%s:%d: error in %s: %s\n", __FILE__, __LINE__, #call, XGBGetLastError()); \
    exit(1);                                                              \
}                                                                       \
}

struct RegressionConfig {
    char* train_path;
    char* valid_path;
    char* objective;
    char* eta;
    char* gamma;
    char* max_depth;
    char* booster;
    int n_trees;
    char* model_save_path;
};

int train_process(struct RegressionConfig* regConfig) {
    int silent = 0;
    int use_gpu = 0;  // set to 1 to use the GPU for training

    // load the data
    DMatrixHandle dtrain, dtest;
    safe_xgboost(XGDMatrixCreateFromFile(regConfig->train_path, silent, &dtrain));
    safe_xgboost(XGDMatrixCreateFromFile(regConfig->valid_path, silent, &dtest));

    // create the booster
    BoosterHandle booster;
    DMatrixHandle eval_dmats[2] = {dtrain, dtest};
    safe_xgboost(XGBoosterCreate(eval_dmats, 2, &booster));

    // configure the training
    // available parameters are described here:
    //   https://xgboost.readthedocs.io/en/latest/parameter.html
    safe_xgboost(XGBoosterSetParam(booster, "tree_method", use_gpu ? "gpu_hist" : "hist"));
    if (use_gpu) {
        // set the GPU to use;
        // this is not necessary, but provided here as an illustration
        safe_xgboost(XGBoosterSetParam(booster, "gpu_id", "0"));
    } else {
        // avoid evaluating objective and metric on a GPU
        safe_xgboost(XGBoosterSetParam(booster, "gpu_id", "-1"));
    }

    safe_xgboost(XGBoosterSetParam(booster, "objective", regConfig->objective));
    safe_xgboost(XGBoosterSetParam(booster, "eta", regConfig->eta));
    safe_xgboost(XGBoosterSetParam(booster, "gamma", regConfig->gamma));
    safe_xgboost(XGBoosterSetParam(booster, "max_depth", regConfig->max_depth));
    safe_xgboost(XGBoosterSetParam(booster, "booster", regConfig->booster));
    safe_xgboost(XGBoosterSetParam(booster, "verbosity", silent ? "0" : "1"));

    // train and evaluate for 1000 iterations
    int n_trees = regConfig->n_trees;
    const char* eval_names[2] = {"train", "test"};
    const char* eval_result = NULL;
    for (int i = 0; i < n_trees; ++i) {
        safe_xgboost(XGBoosterUpdateOneIter(booster, i, dtrain));
        safe_xgboost(XGBoosterEvalOneIter(booster, i, eval_dmats, eval_names, 2, &eval_result));
        printf("%s\n", eval_result);
    }

    char* save_path = regConfig->model_save_path;
    printf("save_path is %s\n", save_path);
    safe_xgboost(XGBoosterSaveModel(booster, save_path))
    safe_xgboost(XGBoosterLoadModel(booster, save_path))
    printf("load_path is %s\n", save_path);

    bst_ulong num_feature = 0;
    safe_xgboost(XGBoosterGetNumFeature(booster, &num_feature));
    printf("num_feature: %llu\n", num_feature);

    // predict
    bst_ulong out_len = 0;
    const float* out_result = NULL;
    int n_print = 10;

    safe_xgboost(XGBoosterPredict(booster, dtest, 0, 0, 0, &out_len, &out_result));
    printf("y_pred: ");
    for (int i = 0; i < n_print; ++i) {
        printf("%1.4f ", out_result[i]);
    }
    printf("\n");

    // print true labels
    safe_xgboost(XGDMatrixGetFloatInfo(dtest, "label", &out_len, &out_result));
    printf("y_test: ");
    for (int i = 0; i < n_print; ++i) {
        printf("%1.4f ", out_result[i]);
    }
    printf("\n");

    // free everything
    safe_xgboost(XGBoosterFree(booster));
    safe_xgboost(XGDMatrixFree(dtrain));
    safe_xgboost(XGDMatrixFree(dtest));
    return 0;
}

int main() {
    RegressionConfig regConfig = {};
    regConfig.train_path = "/tmp/sls-xgb/3313a70119a693cec4d613a4b742d26c.data.train.txt";
    regConfig.valid_path = "/tmp/sls-xgb/3313a70119a693cec4d613a4b742d26c.data.valid.txt";
    regConfig.objective = "reg:gamma";
    regConfig.booster = "gbtree";
    regConfig.eta = "0.01";
    regConfig.max_depth = "8";
    regConfig.gamma = "0.1";
    regConfig.n_trees = 1000;
    regConfig.model_save_path = "/tmp/sls-xgb/xgboost.model";
    train_process(&regConfig);
    return 0;
}
caibinbupt commented 4 years ago

any progress?we have same issue

ohandyya commented 3 years ago

I faced the same problem as well. Any progress?

dmitryikh commented 3 years ago

Hi! Can anybody provide me with the xgboost model file. Then I will be able to reproduce the bug on my side and finally fix it.

Thanks!

ohandyya commented 3 years ago

Here is a dummy model dummy_xgb.model.zip you can use. Try loading this model in go

model, err := leaves.XGEnsembleFromFile("dummy_xgb.model", useTransformation)

gives me the unexpected EOF error.

And it is created from a simple code below.

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
import xgboost as xgb

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Convert y_train and y_test to binary
y_train = np.array(y_train >= 1).astype(int)
y_test = np.array(y_test >= 1).astype(int)

xg_train = xgb.DMatrix(X_train, label=y_train)
xg_test = xgb.DMatrix(X_test, label=y_test)

params = {'objective': 'binary:logistic'}

clf = xgb.train(params, xg_train, 5)

# Make sure prdiction works
print(clf.predict(xg_test))

# Save model
clf.save_model("dummy_xgb.model")

And here is my python environment: System: Python 3.8.4 docker python: 3.8.4 xgboost: 1.2.0 numpy: 1.18.2

And my golang environment: System: Mac Golang version: 1.14.2

ohandyya commented 3 years ago

Update:

I tried loading dummy_xgb.model in golang:1.14 container. and I still get unexpected EOF. I think this can rule out the possibility that Mac OS is the culprit.

ohandyya commented 3 years ago

Hi @dmitryikh, I am wondering if you have got any time to look at this issue?

Update (11/16/2020): I have changed my model from XGBoost to LightGBM, as it yields a better performance on my data. Your package works fantastically on LightGBM model. So XGBoost is no longer an issue for me. And thanks a lot for putting this awesome package.

accfcx commented 2 years ago

I meet the same problem. The algo colleague give me a xgboost model file, and i load it in go project with leaves, and I debug it into the readHeader with EOF error. I does't understand the problem is caused by what?

adangadang commented 2 years ago

xg_iris.zip xg_iris.model xgboost 1.5.2 golang 1.6 python 3.7 readHeader with EOF error.

adangadang commented 2 years ago

python build_iris_model.py

[13:27:51] WARNING: ../src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softmax' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

go run predict_iris_model.go

panic: unexpected EOF

goroutine 1 [running]: main.main() /home/ctr/leaves/test/predict_iris_model.go:21 +0x6c5 exit status 2