dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.08k stars 8.7k forks source link

Supporting ppc64le target with old system libs (GLIBC, LIBSTDC++, LIBGOMP) #4724

Closed sh1ng closed 4 years ago

sh1ng commented 5 years ago

I'm trying to create libxgboost.so portable on a system with gcc version less than 5.0. Everything is building in docker nvidia/cuda:10.0-cudnn7-devel-centos7 gcc-5.3 get build inside a container and used to build xgboost.

Dynamic linking works good, but a new version libstdc++.so has to be presented on a machine where xgboost is executing(for testing I'm also use the same docker image, but without gcc-5).

When xgboost get build with static linking -static-libgcc -static-libstdc++ or -static-libgcc -static-libstdc++ libgcc.a -libstdc++.a it returns pretty weird error regarding parsing input parameter.

[2019-07-25T15:28:11.864Z] self = XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,

[2019-07-25T15:28:11.864Z]              colsample_bynode=1, colsample_bytree=...ale_pos_weight=1,

[2019-07-25T15:28:11.864Z]              seed=None, silent=True, subsample=1.0, tree_method='gpu_hist',

[2019-07-25T15:28:11.864Z]              verbosity=1)

[2019-07-25T15:28:11.864Z] X = array([[ 8.  ,  0.45,  0.  ,  1.  ,  0.  ,  0.  ,  1.  ,  0.  ,  0.  ,

[2019-07-25T15:28:11.864Z]          0.  ],

[2019-07-25T15:28:11.864Z]        [ 7.  ,  0.99,  1.  ,  1...

[2019-07-25T15:28:11.864Z]          0.  ],

[2019-07-25T15:28:11.864Z]        [ 7.  ,  0.88,  0.  ,  1.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,

[2019-07-25T15:28:11.864Z]          1.  ]], dtype=float32)

[2019-07-25T15:28:11.864Z] y = array([1., 0., 1., 0., 0., 0., 1., 1.], dtype=float32), sample_weight = None

[2019-07-25T15:28:11.864Z] eval_set = None, eval_metric = None, early_stopping_rounds = None

[2019-07-25T15:28:11.864Z] early_stopping_threshold = None, early_stopping_limit = None, verbose = True

[2019-07-25T15:28:11.864Z] xgb_model = None, sample_weight_eval_set = None, callbacks = None

[2019-07-25T15:28:11.864Z] 

[2019-07-25T15:28:11.864Z]         def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,

[2019-07-25T15:28:11.864Z]                 early_stopping_rounds=None, early_stopping_threshold=None, early_stopping_limit=None, verbose=True, xgb_model=None,

[2019-07-25T15:28:11.864Z]                 sample_weight_eval_set=None, callbacks=None):

[2019-07-25T15:28:11.864Z]             # pylint: disable=missing-docstring,invalid-name,attribute-defined-outside-init

[2019-07-25T15:28:11.864Z]             """

[2019-07-25T15:28:11.864Z]             Fit the gradient boosting model

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]             Parameters

[2019-07-25T15:28:11.864Z]             ----------

[2019-07-25T15:28:11.864Z]             X : array_like

[2019-07-25T15:28:11.864Z]                 Feature matrix

[2019-07-25T15:28:11.864Z]             y : array_like

[2019-07-25T15:28:11.864Z]                 Labels

[2019-07-25T15:28:11.864Z]             sample_weight : array_like

[2019-07-25T15:28:11.864Z]                 instance weights

[2019-07-25T15:28:11.864Z]             eval_set : list, optional

[2019-07-25T15:28:11.864Z]                 A list of (X, y) tuple pairs to use as a validation set for

[2019-07-25T15:28:11.864Z]                 early-stopping

[2019-07-25T15:28:11.864Z]             sample_weight_eval_set : list, optional

[2019-07-25T15:28:11.864Z]                 A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of

[2019-07-25T15:28:11.864Z]                 instance weights on the i-th validation set.

[2019-07-25T15:28:11.864Z]             eval_metric : str, callable, optional

[2019-07-25T15:28:11.864Z]                 If a str, should be a built-in evaluation metric to use. See

[2019-07-25T15:28:11.864Z]                 doc/parameter.rst. If callable, a custom evaluation metric. The call

[2019-07-25T15:28:11.864Z]                 signature is func(y_predicted, y_true) where y_true will be a

[2019-07-25T15:28:11.864Z]                 DMatrix object such that you may need to call the get_label

[2019-07-25T15:28:11.864Z]                 method. It must return a str, value pair where the str is a name

[2019-07-25T15:28:11.864Z]                 for the evaluation and value is the value of the evaluation

[2019-07-25T15:28:11.864Z]                 function. This objective is always minimized.

[2019-07-25T15:28:11.864Z]             early_stopping_rounds : int

[2019-07-25T15:28:11.864Z]                 Activates early stopping. Validation error needs to decrease at

[2019-07-25T15:28:11.864Z]                 least every <early_stopping_rounds> round(s) to continue training.

[2019-07-25T15:28:11.864Z]                 Requires at least one item in evals.  If there's more than one,

[2019-07-25T15:28:11.864Z]                 will use the last. Returns the model from the last iteration

[2019-07-25T15:28:11.864Z]                 (not the best one). If early stopping occurs, the model will

[2019-07-25T15:28:11.864Z]                 have three additional fields: bst.best_score, bst.best_iteration

[2019-07-25T15:28:11.864Z]                 and bst.best_ntree_limit.

[2019-07-25T15:28:11.864Z]                 (Use bst.best_ntree_limit to get the correct value if num_parallel_tree

[2019-07-25T15:28:11.864Z]                 and/or num_class appears in the parameters)

[2019-07-25T15:28:11.864Z]             early_stopping_threshold : float

[2019-07-25T15:28:11.864Z]              Sets an potional threshold to smoothen the early stopping policy.

[2019-07-25T15:28:11.864Z]                If after early_stopping_rounds iterations, the model hasn't improved

[2019-07-25T15:28:11.864Z]              more than threshold times the score from early_stopping_rounds before,

[2019-07-25T15:28:11.864Z]                 then the learning stops.

[2019-07-25T15:28:11.864Z]             early_stopping_limit: float

[2019-07-25T15:28:11.864Z]                 Sets limit of "threshold times the score from early_stopping_rounds_before"

[2019-07-25T15:28:11.864Z]                 to value of limit.

[2019-07-25T15:28:11.864Z]             verbose : bool

[2019-07-25T15:28:11.864Z]                 If `verbose` and an evaluation set is used, writes the evaluation

[2019-07-25T15:28:11.864Z]                 metric measured on the validation set to stderr.

[2019-07-25T15:28:11.864Z]             xgb_model : str

[2019-07-25T15:28:11.864Z]                 file name of stored xgb model or 'Booster' instance Xgb model to be

[2019-07-25T15:28:11.864Z]                 loaded before training (allows training continuation).

[2019-07-25T15:28:11.864Z]             callbacks : list of callback functions

[2019-07-25T15:28:11.864Z]                 List of callback functions that are applied at end of each iteration.

[2019-07-25T15:28:11.864Z]                 It is possible to use predefined callbacks by using :ref:`callback_api`.

[2019-07-25T15:28:11.864Z]                 Example:

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]                 .. code-block:: python

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]                     [xgb.callback.reset_learning_rate(custom_rates)]

[2019-07-25T15:28:11.864Z]             """

[2019-07-25T15:28:11.864Z]             if sample_weight is not None:

[2019-07-25T15:28:11.864Z]                 trainDmatrix = DMatrix(X, label=y, weight=sample_weight,

[2019-07-25T15:28:11.864Z]                                        missing=self.missing, nthread=self.n_jobs)

[2019-07-25T15:28:11.864Z]             else:

[2019-07-25T15:28:11.864Z]                 trainDmatrix = DMatrix(X, label=y, missing=self.missing, nthread=self.n_jobs)

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]             evals_result = {}

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]             if eval_set is not None:

[2019-07-25T15:28:11.864Z]                 if sample_weight_eval_set is None:

[2019-07-25T15:28:11.864Z]                     sample_weight_eval_set = [None] * len(eval_set)

[2019-07-25T15:28:11.864Z]                 evals = list(

[2019-07-25T15:28:11.864Z]                     DMatrix(eval_set[i][0], label=eval_set[i][1], missing=self.missing,

[2019-07-25T15:28:11.864Z]                             weight=sample_weight_eval_set[i], nthread=self.n_jobs)

[2019-07-25T15:28:11.864Z]                     for i in range(len(eval_set)))

[2019-07-25T15:28:11.864Z]                 evals = list(zip(evals, ["validation_{}".format(i) for i in

[2019-07-25T15:28:11.864Z]                                          range(len(evals))]))

[2019-07-25T15:28:11.864Z]             else:

[2019-07-25T15:28:11.864Z]                 evals = ()

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]             params = self.get_xgb_params()

[2019-07-25T15:28:11.864Z]     

[2019-07-25T15:28:11.864Z]             if callable(self.objective):

[2019-07-25T15:28:11.864Z]                 obj = _objective_decorator(self.objective)

[2019-07-25T15:28:11.864Z]                 params["objective"] = "reg:linear"

[2019-07-25T15:28:11.865Z]             else:

[2019-07-25T15:28:11.865Z]                 obj = None

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]             feval = eval_metric if callable(eval_metric) else None

[2019-07-25T15:28:11.865Z]             if eval_metric is not None:

[2019-07-25T15:28:11.865Z]                 if callable(eval_metric):

[2019-07-25T15:28:11.865Z]                     eval_metric = None

[2019-07-25T15:28:11.865Z]                 else:

[2019-07-25T15:28:11.865Z]                     params.update({'eval_metric': eval_metric})

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]             self._Booster = train(params, trainDmatrix,

[2019-07-25T15:28:11.865Z]                                   self.get_num_boosting_rounds(), evals=evals,

[2019-07-25T15:28:11.865Z]                                   early_stopping_rounds=early_stopping_rounds,

[2019-07-25T15:28:11.865Z]                                   early_stopping_threshold=early_stopping_threshold,

[2019-07-25T15:28:11.865Z]                                   early_stopping_limit=early_stopping_limit,

[2019-07-25T15:28:11.865Z]                                   evals_result=evals_result, obj=obj, feval=feval,

[2019-07-25T15:28:11.865Z]                                   verbose_eval=verbose, xgb_model=xgb_model,

[2019-07-25T15:28:11.865Z] >                                 callbacks=callbacks)

[2019-07-25T15:28:11.865Z] 

[2019-07-25T15:28:11.865Z] /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/sklearn.py:406: 

[2019-07-25T15:28:11.865Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2019-07-25T15:28:11.865Z] 

[2019-07-25T15:28:11.865Z] params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}

[2019-07-25T15:28:11.865Z] dtrain = <xgboost.core.DMatrix object at 0x7f1f6cb72908>, num_boost_round = 100

[2019-07-25T15:28:11.865Z] evals = (), obj = None, feval = None, maximize = False

[2019-07-25T15:28:11.865Z] early_stopping_rounds = None, early_stopping_threshold = None

[2019-07-25T15:28:11.865Z] early_stopping_limit = None, evals_result = {}, verbose_eval = True

[2019-07-25T15:28:11.865Z] xgb_model = None

[2019-07-25T15:28:11.865Z] callbacks = [<function print_evaluation.<locals>.callback at 0x7f1f6c70a268>, <function record_evaluation.<locals>.callback at 0x7f1f6c70a378>]

[2019-07-25T15:28:11.865Z] learning_rates = None

[2019-07-25T15:28:11.865Z] 

[2019-07-25T15:28:11.865Z]     def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,

[2019-07-25T15:28:11.865Z]               maximize=False, early_stopping_rounds=None, early_stopping_threshold=None,early_stopping_limit=None,

[2019-07-25T15:28:11.865Z]               evals_result=None,

[2019-07-25T15:28:11.865Z]               verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None):

[2019-07-25T15:28:11.865Z]         # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init

[2019-07-25T15:28:11.865Z]         """Train a booster with given parameters.

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         Parameters

[2019-07-25T15:28:11.865Z]         ----------

[2019-07-25T15:28:11.865Z]         params : dict

[2019-07-25T15:28:11.865Z]             Booster params.

[2019-07-25T15:28:11.865Z]         dtrain : DMatrix

[2019-07-25T15:28:11.865Z]             Data to be trained.

[2019-07-25T15:28:11.865Z]         num_boost_round: int

[2019-07-25T15:28:11.865Z]             Number of boosting iterations.

[2019-07-25T15:28:11.865Z]         evals: list of pairs (DMatrix, string)

[2019-07-25T15:28:11.865Z]             List of items to be evaluated during training, this allows user to watch

[2019-07-25T15:28:11.865Z]             performance on the validation set.

[2019-07-25T15:28:11.865Z]         obj : function

[2019-07-25T15:28:11.865Z]             Customized objective function.

[2019-07-25T15:28:11.865Z]         feval : function

[2019-07-25T15:28:11.865Z]             Customized evaluation function.

[2019-07-25T15:28:11.865Z]         maximize : bool

[2019-07-25T15:28:11.865Z]             Whether to maximize feval.

[2019-07-25T15:28:11.865Z]         early_stopping_rounds: int

[2019-07-25T15:28:11.865Z]             Activates early stopping. Validation error needs to decrease at least

[2019-07-25T15:28:11.865Z]             every **early_stopping_rounds** round(s) to continue training.

[2019-07-25T15:28:11.865Z]             Requires at least one item in **evals**.

[2019-07-25T15:28:11.865Z]             If there's more than one, will use the last.

[2019-07-25T15:28:11.865Z]             Returns the model from the last iteration (not the best one).

[2019-07-25T15:28:11.865Z]             If early stopping occurs, the model will have three additional fields:

[2019-07-25T15:28:11.865Z]             ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.

[2019-07-25T15:28:11.865Z]             (Use ``bst.best_ntree_limit`` to get the correct value if

[2019-07-25T15:28:11.865Z]             ``num_parallel_tree`` and/or ``num_class`` appears in the parameters)

[2019-07-25T15:28:11.865Z]         early_stopping_threshold : float

[2019-07-25T15:28:11.865Z]             Sets an potional threshold to smoothen the early stopping policy.

[2019-07-25T15:28:11.865Z]                If after early_stopping_rounds iterations, the model hasn't improved

[2019-07-25T15:28:11.865Z]             more than threshold times the score from early_stopping_rounds before,

[2019-07-25T15:28:11.865Z]             then the learning stops.

[2019-07-25T15:28:11.865Z]         early_stopping_limit: float

[2019-07-25T15:28:11.865Z]             Sets limit of "threshold times the score from early_stopping_rounds_before"

[2019-07-25T15:28:11.865Z]             to value of limit.

[2019-07-25T15:28:11.865Z]         evals_result: dict

[2019-07-25T15:28:11.865Z]             This dictionary stores the evaluation results of all the items in watchlist.

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]             Example: with a watchlist containing

[2019-07-25T15:28:11.865Z]             ``[(dtest,'eval'), (dtrain,'train')]`` and

[2019-07-25T15:28:11.865Z]             a parameter containing ``('eval_metric': 'logloss')``,

[2019-07-25T15:28:11.865Z]             the **evals_result** returns

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]             .. code-block:: python

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]                 {'train': {'logloss': ['0.48253', '0.35953']},

[2019-07-25T15:28:11.865Z]                  'eval': {'logloss': ['0.480385', '0.357756']}}

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         verbose_eval : bool or int

[2019-07-25T15:28:11.865Z]             Requires at least one item in **evals**.

[2019-07-25T15:28:11.865Z]             If **verbose_eval** is True then the evaluation metric on the validation set is

[2019-07-25T15:28:11.865Z]             printed at each boosting stage.

[2019-07-25T15:28:11.865Z]             If **verbose_eval** is an integer then the evaluation metric on the validation set

[2019-07-25T15:28:11.865Z]             is printed at every given **verbose_eval** boosting stage. The last boosting stage

[2019-07-25T15:28:11.865Z]             / the boosting stage found by using **early_stopping_rounds** is also printed.

[2019-07-25T15:28:11.865Z]             Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric

[2019-07-25T15:28:11.865Z]             is printed every 4 boosting stages, instead of every boosting stage.

[2019-07-25T15:28:11.865Z]         learning_rates: list or function (deprecated - use callback API instead)

[2019-07-25T15:28:11.865Z]             List of learning rate for each boosting round

[2019-07-25T15:28:11.865Z]             or a customized function that calculates eta in terms of

[2019-07-25T15:28:11.865Z]             current number of round and the total number of boosting round (e.g. yields

[2019-07-25T15:28:11.865Z]             learning rate decay)

[2019-07-25T15:28:11.865Z]         xgb_model : file name of stored xgb model or 'Booster' instance

[2019-07-25T15:28:11.865Z]             Xgb model to be loaded before training (allows training continuation).

[2019-07-25T15:28:11.865Z]         callbacks : list of callback functions

[2019-07-25T15:28:11.865Z]             List of callback functions that are applied at end of each iteration.

[2019-07-25T15:28:11.865Z]             It is possible to use predefined callbacks by using

[2019-07-25T15:28:11.865Z]             :ref:`Callback API <callback_api>`.

[2019-07-25T15:28:11.865Z]             Example:

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]             .. code-block:: python

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]                 [xgb.callback.reset_learning_rate(custom_rates)]

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         Returns

[2019-07-25T15:28:11.865Z]         -------

[2019-07-25T15:28:11.865Z]         Booster : a trained booster model

[2019-07-25T15:28:11.865Z]         """

[2019-07-25T15:28:11.865Z]         callbacks = [] if callbacks is None else callbacks

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         # Most of legacy advanced options becomes callbacks

[2019-07-25T15:28:11.865Z]         if isinstance(verbose_eval, bool) and verbose_eval:

[2019-07-25T15:28:11.865Z]             callbacks.append(callback.print_evaluation())

[2019-07-25T15:28:11.865Z]         else:

[2019-07-25T15:28:11.865Z]             if isinstance(verbose_eval, int):

[2019-07-25T15:28:11.865Z]                 callbacks.append(callback.print_evaluation(verbose_eval))

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         if early_stopping_rounds is not None:

[2019-07-25T15:28:11.865Z]             callbacks.append(callback.early_stop(early_stopping_rounds,

[2019-07-25T15:28:11.865Z]                                                  early_stopping_threshold,

[2019-07-25T15:28:11.865Z]                                                  early_stopping_limit,

[2019-07-25T15:28:11.865Z]                                                  maximize=maximize,

[2019-07-25T15:28:11.865Z]                                                  verbose=bool(verbose_eval)))

[2019-07-25T15:28:11.865Z]         if evals_result is not None:

[2019-07-25T15:28:11.865Z]             callbacks.append(callback.record_evaluation(evals_result))

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         if learning_rates is not None:

[2019-07-25T15:28:11.865Z]             warnings.warn("learning_rates parameter is deprecated - use callback API instead",

[2019-07-25T15:28:11.865Z]                           DeprecationWarning)

[2019-07-25T15:28:11.865Z]             callbacks.append(callback.reset_learning_rate(learning_rates))

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         return _train_internal(params, dtrain,

[2019-07-25T15:28:11.865Z]                                num_boost_round=num_boost_round,

[2019-07-25T15:28:11.865Z]                                evals=evals,

[2019-07-25T15:28:11.865Z]                                obj=obj, feval=feval,

[2019-07-25T15:28:11.865Z] >                              xgb_model=xgb_model, callbacks=callbacks)

[2019-07-25T15:28:11.865Z] 

[2019-07-25T15:28:11.865Z] /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:227: 

[2019-07-25T15:28:11.865Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2019-07-25T15:28:11.865Z] 

[2019-07-25T15:28:11.865Z] params = {'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, ...}

[2019-07-25T15:28:11.865Z] dtrain = <xgboost.core.DMatrix object at 0x7f1f6cb72908>, num_boost_round = 100

[2019-07-25T15:28:11.865Z] evals = [], obj = None, feval = None, xgb_model = None

[2019-07-25T15:28:11.865Z] callbacks = [<function print_evaluation.<locals>.callback at 0x7f1f6c70a268>, <function record_evaluation.<locals>.callback at 0x7f1f6c70a378>]

[2019-07-25T15:28:11.865Z] 

[2019-07-25T15:28:11.865Z]     def _train_internal(params, dtrain,

[2019-07-25T15:28:11.865Z]                         num_boost_round=10, evals=(),

[2019-07-25T15:28:11.865Z]                         obj=None, feval=None,

[2019-07-25T15:28:11.865Z]                         xgb_model=None, callbacks=None):

[2019-07-25T15:28:11.865Z]         """internal training function"""

[2019-07-25T15:28:11.865Z]         callbacks = [] if callbacks is None else callbacks

[2019-07-25T15:28:11.865Z]         evals = list(evals)

[2019-07-25T15:28:11.865Z]         if isinstance(params, dict) \

[2019-07-25T15:28:11.865Z]                 and 'eval_metric' in params \

[2019-07-25T15:28:11.865Z]                 and isinstance(params['eval_metric'], list):

[2019-07-25T15:28:11.865Z]             params = dict((k, v) for k, v in params.items())

[2019-07-25T15:28:11.865Z]             eval_metrics = params['eval_metric']

[2019-07-25T15:28:11.865Z]             params.pop("eval_metric", None)

[2019-07-25T15:28:11.865Z]             params = list(params.items())

[2019-07-25T15:28:11.865Z]             for eval_metric in eval_metrics:

[2019-07-25T15:28:11.865Z]                 params += [('eval_metric', eval_metric)]

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         bst = Booster(params, [dtrain] + [d[0] for d in evals])

[2019-07-25T15:28:11.865Z]         nboost = 0

[2019-07-25T15:28:11.865Z]         num_parallel_tree = 1

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         if xgb_model is not None:

[2019-07-25T15:28:11.865Z]             if not isinstance(xgb_model, STRING_TYPES):

[2019-07-25T15:28:11.865Z]                 xgb_model = xgb_model.save_raw()

[2019-07-25T15:28:11.865Z]             bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)

[2019-07-25T15:28:11.865Z]             nboost = len(bst.get_dump())

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         _params = dict(params) if isinstance(params, list) else params

[2019-07-25T15:28:11.865Z]     

[2019-07-25T15:28:11.865Z]         if 'num_parallel_tree' in _params:

[2019-07-25T15:28:11.865Z]             num_parallel_tree = _params['num_parallel_tree']

[2019-07-25T15:28:11.866Z]             nboost //= num_parallel_tree

[2019-07-25T15:28:11.866Z]         if 'num_class' in _params:

[2019-07-25T15:28:11.866Z]             nboost //= _params['num_class']

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         # Distributed code: Load the checkpoint from rabit.

[2019-07-25T15:28:11.866Z]         version = bst.load_rabit_checkpoint()

[2019-07-25T15:28:11.866Z]         assert rabit.get_world_size() != 1 or version == 0

[2019-07-25T15:28:11.866Z]         rank = rabit.get_rank()

[2019-07-25T15:28:11.866Z]         start_iteration = int(version / 2)

[2019-07-25T15:28:11.866Z]         nboost += start_iteration

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         callbacks_before_iter = [

[2019-07-25T15:28:11.866Z]             cb for cb in callbacks if cb.__dict__.get('before_iteration', False)]

[2019-07-25T15:28:11.866Z]         callbacks_after_iter = [

[2019-07-25T15:28:11.866Z]             cb for cb in callbacks if not cb.__dict__.get('before_iteration', False)]

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         for i in range(start_iteration, num_boost_round):

[2019-07-25T15:28:11.866Z]             for cb in callbacks_before_iter:

[2019-07-25T15:28:11.866Z]                 cb(CallbackEnv(model=bst,

[2019-07-25T15:28:11.866Z]                                cvfolds=None,

[2019-07-25T15:28:11.866Z]                                iteration=i,

[2019-07-25T15:28:11.866Z]                                begin_iteration=start_iteration,

[2019-07-25T15:28:11.866Z]                                end_iteration=num_boost_round,

[2019-07-25T15:28:11.866Z]                                rank=rank,

[2019-07-25T15:28:11.866Z]                                evaluation_result_list=None))

[2019-07-25T15:28:11.866Z]             # Distributed code: need to resume to this point.

[2019-07-25T15:28:11.866Z]             # Skip the first update if it is a recovery step.

[2019-07-25T15:28:11.866Z]             if version % 2 == 0:

[2019-07-25T15:28:11.866Z] >               bst.update(dtrain, i, obj)

[2019-07-25T15:28:11.866Z] 

[2019-07-25T15:28:11.866Z] /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/training.py:74: 

[2019-07-25T15:28:11.866Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2019-07-25T15:28:11.866Z] 

[2019-07-25T15:28:11.866Z] self = <xgboost.core.Booster object at 0x7f1f6c700c88>

[2019-07-25T15:28:11.866Z] dtrain = <xgboost.core.DMatrix object at 0x7f1f6cb72908>, iteration = 0

[2019-07-25T15:28:11.866Z] fobj = None

[2019-07-25T15:28:11.866Z] 

[2019-07-25T15:28:11.866Z]     def update(self, dtrain, iteration, fobj=None):

[2019-07-25T15:28:11.866Z]         """Update for one iteration, with objective function calculated

[2019-07-25T15:28:11.866Z]         internally.  This function should not be called directly by users.

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         Parameters

[2019-07-25T15:28:11.866Z]         ----------

[2019-07-25T15:28:11.866Z]         dtrain : DMatrix

[2019-07-25T15:28:11.866Z]             Training data.

[2019-07-25T15:28:11.866Z]         iteration : int

[2019-07-25T15:28:11.866Z]             Current iteration number.

[2019-07-25T15:28:11.866Z]         fobj : function

[2019-07-25T15:28:11.866Z]             Customized objective function.

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         """

[2019-07-25T15:28:11.866Z]         if not isinstance(dtrain, DMatrix):

[2019-07-25T15:28:11.866Z]             raise TypeError('invalid training matrix: {}'.format(type(dtrain).__name__))

[2019-07-25T15:28:11.866Z]         self._validate_features(dtrain)

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         if fobj is None:

[2019-07-25T15:28:11.866Z]             _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),

[2019-07-25T15:28:11.866Z] >                                                   dtrain.handle))

[2019-07-25T15:28:11.866Z] 

[2019-07-25T15:28:11.866Z] /opt/h2oai/h2o4gpu/python/lib/python3.6/site-packages/xgboost/core.py:1115: 

[2019-07-25T15:28:11.866Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2019-07-25T15:28:11.866Z] 

[2019-07-25T15:28:11.866Z] ret = -1

[2019-07-25T15:28:11.866Z] 

[2019-07-25T15:28:11.866Z]     def _check_call(ret):

[2019-07-25T15:28:11.866Z]         """Check the return value of C API call

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         This function will raise exception when error occurs.

[2019-07-25T15:28:11.866Z]         Wrap every API call with this function

[2019-07-25T15:28:11.866Z]     

[2019-07-25T15:28:11.866Z]         Parameters

[2019-07-25T15:28:11.866Z]         ----------

[2019-07-25T15:28:11.866Z]         ret : int

[2019-07-25T15:28:11.866Z]             return value from API calls

[2019-07-25T15:28:11.866Z]         """

[2019-07-25T15:28:11.866Z]         if ret != 0:

[2019-07-25T15:28:11.866Z] >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))

[2019-07-25T15:28:11.866Z] E           xgboost.core.XGBoostError: Invalid Parameter format for seed expect int but value='1234'

Or segfault

Program received signal SIGSEGV, Segmentation fault.
0x00007fffaf2ee9a0 in length (this=0x7fffffff3290)
    at /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3129
3129    /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h: No such file or directory.
(gdb) bt
#0  0x00007fffaf2ee9a0 in length (this=0x7fffffff3290)
    at /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:3129
#1  __copy<char> (
    s=<error reading variable: Cannot access memory at address 0x10>, 
    dest=@0x1622a20: 0x0)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:482
#2  std::__facet_shims::__numpunct_fill_cache<char> (
    f=0x7fffbc037760 <(anonymous namespace)::ctype_w>, c=0x1622a10)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:514
#3  0x00007fffaf2f49fd in numpunct_shim (c=0x1622a10, 
    f=0x7fffbc037760 <(anonymous namespace)::ctype_w>, this=0xeafae0)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:238
#4  std::locale::facet::_M_sso_shim (
    this=0x7fffbc037760 <(anonymous namespace)::ctype_w>, 
    which=<optimized out>)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:797
#5  0x00007fffaf2a2706 in std::locale::_Impl::_M_install_facet (
    this=0x7fffbc0383e0 <(anonymous namespace)::c_locale_impl>, 
    __idp=<optimized out>, 
    __fp=0x7fffbc037760 <(anonymous namespace)::ctype_w>)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++98/locale.cc:372
#6  0x00007fffaf29ac50 in _M_init_facet<std::ctype<wchar_t> > (
---Type <return> to continue, or q <return> to quit---
    __facet=0x7fffbc037760 <(anonymous namespace)::ctype_w>, 
    this=0x7fffbc0383e0 <(anonymous namespace)::c_locale_impl>)
    at /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.h:602
#7  std::locale::_Impl::_Impl (
    this=0x7fffbc0383e0 <(anonymous namespace)::c_locale_impl>, 
    __refs=<optimized out>)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++98/locale_init.cc:509
#8  0x00007fffaf29b615 in std::locale::_S_initialize_once ()
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++98/locale_init.cc:307
#9  0x00007ffff76a2e40 in pthread_once () from /usr/lib64/libpthread.so.0
#10 0x00007fffaf29b661 in __gthread_once (
    __func=0x7fffaf29b600 <std::locale::_S_initialize_once()>, 
    __once=<optimized out>)
    at /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:699
#11 std::locale::_S_initialize ()
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++98/locale_init.cc:316
#12 0x00007fffaf29b6a3 in std::locale::locale (
    this=0x7fffbc038d98 <__gnu_internal::buf_cout_sync+56>)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++98/locale_init.cc:250
#13 0x00007fffaf29d434 in basic_streambuf (this=<optimized out>)
    at /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/streambuf:---Type <return> to continue, or q <return> to quit---
466
#14 stdio_sync_filebuf (__f=0x7ffff6f87400 <_IO_2_1_stdout_>, 
    this=<optimized out>)
    at /gcc-5.3.0-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/stdio_sync_filebuf.h:80
#15 std::ios_base::Init::Init (this=<optimized out>)
    at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++98/ios_init.cc:85
#16 0x00007fffaef75cd0 in __static_initialization_and_destruction_0 (
    __priority=65535, __initialize_p=1)
    at /usr/local/include/c++/5.3.0/iostream:74
#17 _GLOBAL__sub_I_xgbfi.cc(void) ()
    at /root/repo/xgboost/src/analysis/xgbfi.cc:594
#18 0x00007ffff7dea8f3 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#19 0x00007ffff7def4ce in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#20 0x00007ffff7dea704 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#21 0x00007ffff7deeabb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#22 0x00007ffff7492eeb in dlopen_doit () from /usr/lib64/libdl.so.2
#23 0x00007ffff7dea704 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#24 0x00007ffff74934ed in _dlerror_run () from /usr/lib64/libdl.so.2
#25 0x00007ffff7492f81 in dlopen@@GLIBC_2.2.5 () from /usr/lib64/libdl.so.2
#26 0x00007ffff2347e11 in py_dl_open ()

All test cases from https://github.com/h2oai/h2o4gpu/tree/master/tests/python/open_data/gbm are failing, so it's parameter's parser issue.

All source code is available in https://github.com/h2oai/h2o4gpu/pull/790

hcho3 commented 5 years ago

-static-libgcc -static-libstdc++

I don't think XGBoost has been tested with this setting. What's your use case for this?

sh1ng commented 5 years ago

Support system with GCC < 5.

trivialfis commented 5 years ago

@sh1ng Could you please provide an easier script for reproducing?

sh1ng commented 5 years ago

Updated original description.

More generic question, how do you plan to ship whl on a system with GCC < 5 and an old version of GLIBCXX?

hcho3 commented 5 years ago

@sh1ng Previously, we used to use GCC 4.8 + CentOS 6 Docker image to build XGBoost wheels. We upgraded GCC to 5+ because 4.8 doesn't quite provide full support for C++11 standard. Today, I tried compiling XGBoost with my old laptop and here's what I got:

chohyu01@chohyu01-Lenovo-IdeaPad-Y500:~/Desktop/xgboost$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.6 LTS
Release:    14.04
Codename:   trusty

chohyu01@chohyu01-Lenovo-IdeaPad-Y500:~/Desktop/xgboost$  make -j10

[lots of outputs later]

g++ -c -DDMLC_LOG_CUSTOMIZE=1 -std=c++11 -Wall -Wno-unknown-pragmas -Iinclude   -Idmlc-core/include -Irabit/include -I/include -O3 -funroll-loops -msse2 -fPIC -fopenmp src/tree/updater_colmaker.cc -o build/tree/updater_colmaker.o
src/tree/tree_model.cc: In constructor ‘xgboost::GraphvizGenerator::GraphvizGenerator(const xgboost::FeatureMap&, const string&, bool)’:
src/tree/tree_model.cc:465:55: error: invalid initialization of non-const reference of type ‘std::stringstream& {aka std::basic_stringstream<char>&}’ from an rvalue of type ‘<brace-enclosed initializer list>’
       TreeGenerator(fmap, with_stats), ss_{SuperT::ss_} {
                                                       ^
make: *** [build/tree/tree_model.o] Error 1

I understand the need for supporting old systems with old GLIBC version. @thesuperzapper raised a similar request in https://github.com/dmlc/xgboost/pull/4538#issuecomment-500704601. There's also a similar issue with LIBGOMP version as well: https://github.com/dmlc/xgboost/pull/4306#issuecomment-495304573.

However, using GCC 4.8 has its own cost, since GCC 4.8 doesn't support C++11 fully and it's going to force developers to implement (potentially time-consuming) workarounds. This is why I'm not in favor of reverting to GCC 4.8. Do you have any good suggestion?

hcho3 commented 5 years ago

Some ideas: 1) Use MUSL to replace GLIBC. MUSL is designed to allow full static linking: https://www.musl-libc.org 2) It appears that we can actually statically link LIBGOMP, with right set of compilation flags: https://stackoverflow.com/questions/23869981/linking-openmp-statically-with-gcc?rq=1

Maybe we can start an experimental GitHub repository and get XGBoost compiled with 1) and 2). If we can remove dynamic linking against GLIBC and LIBGOMP, then we could keep GCC 5+ and still support old systems.

hcho3 commented 5 years ago

@CodingCat @chenqin @yinlou In your experience, are they still many Spark clusters running old Linux?

sh1ng commented 5 years ago

It'd be great if someone of you could help with 2).

Craigacp commented 5 years ago

Centos/RHEL/Oracle Linux 7 are still valid platforms and worth supporting. I managed to compile the 0.90 release on OL6 without too much trouble using a gcc 4.8 compiler. The binary runs on OL6, 7 and a newer Ubuntu.

It's possible to get updated compilers on RHEL7 and OL7 using the devtoolset (https://developers.redhat.com/products/developertoolset/hello-world#fndtn-windows, https://docs.oracle.com/cd/E37670_01/E59096/html/section_zlg_m3g_dq.html). Binaries compiled with these tools seem to work fine on stock releases of OL7 (as that's our deployment environment and I don't mess with the libraries on it).

hcho3 commented 5 years ago

@Craigacp GCC 4.8 won't actually work, because you can compile XGBoost 0.90 with it but not run it. The reason is because <regex> is broken in GCC 4.8: http://www.michaelbrich.com/no-working-around-broken-c11-regex-in-gcc-4-8/. In addition, I think XGBoost developers would like to use the full range of features available in C++11; using GCC 4.8 would force them to adopt inconvenient workarounds and hamper developer productivity. For example, some code that was recently added after 0.90 relies on full C++11 support and does not compile with GCC 4.8: https://github.com/dmlc/xgboost/issues/4724#issuecomment-518061534

A better way forward would be to use GCC 5+ but generate static binaries that can run on older platforms.

Craigacp commented 5 years ago

Binaries compiled using the dev tools version of gcc from RHEL7 & OL7 run fine on versions of those operating systems without the dev tools installed. So you can compile using GCC 5 or later on that platform, but it still binds to the base glibc.

I've been running the builds of xgboost 0.90 that I compiled on OL6 and everything seems to work fine. What codepath is regex in? I might not be hitting it with my single node Java usecases.

hcho3 commented 5 years ago

@Craigacp Currently, the CLI config parser uses <regex>.

So you can compile using GCC 5 or later on that platform, but it still binds to the base glibc.

Indeed, you are right. I just checked all dependencies and libxgboost.so by running

hcho3@ubuntu# objdump -T libxgboost.so

and got the following symbols:

(libxgboost.so was created by compiling XGBoost inside CentOS 6 Docker container with Devdevtoolset-4. See the container at https://github.com/dmlc/xgboost/blob/master/tests/ci_build/Dockerfile.gpu_build)

GLIBC_2.4 should be compatible with CentOS 6.x, according to https://pkgs.org/download/libc.so.6(GLIBC_2.4)

So I suppose we only need to remove hard dependency on LIBGOMP then. See #4489

hcho3 commented 5 years ago

After some googling, here's information I found:

Symbol Package Availability
CXXABI_1.3.3 libstdc++-4.4.7-23.el6 CentOS 6.10
GCC_3.0 libgcc-4.4.7-23.el6 CentOS 6.10
GLIBCXX_3.4.11 libstdc++-4.4.7-23.el6 CentOS 6.10
GLIBC_2.4 glibc-2.12-1.212.el6 CentOS 6.10
GOMP_4.0 libgomp-4.4.7-23.el6 CentOS 6.10
OMP_1.0 libgomp-4.4.7-23.el6 CentOS 6.10

Hmm, I'm still trying to figure out why #4489 happened. Maybe it has to do with Travis CI not having latest packages for its Trusty target? According to this table at least, I should be able to compile XGBoost with GCC 5.x and then run the compiled binary on CentOS 6.10.

I probably should add Ubuntu Trusty and CentOS 6 targets to CI testing harness.

@sh1ng Are you targeting platforms that are older than CentOS 6?

sh1ng commented 5 years ago

We use CentOS 7, including ppc64le platform.

devtoolset works on x86_64 for CUDA-10 indeed. If I'm not mistaken it ships libstdc++ changes as a separate statically linked library. It's not fully fulfill our needs. We need to support ppc64le platform and CUDA-9 that may not support GCC-6(there's no devtoolset-5).

That's why I've installed gcc(5.3) from sources.

hcho3 commented 5 years ago

@sh1ng So it appears that the current binary distribution (xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl) is already functional on CentOS 6 or newer, as long as the processor architecture is x86_64. I was just able to install and run XGBoost 0.90 inside Ubuntu 14.04 (Trusty) and CentOS 6 with

pip3 install xgboost==0.90

So I'd say that we are pretty solid when it comes to supporting x86-64 systems with old OSes.

We need to support ppc64le platform and CUDA-9 that may not support GCC-6(there's no devtoolset-5).

I'm afraid I won't be of much help when it comes to ppc64le platform. Our CI (https://xgboost-ci.net) covers x86-64 only and to my knowledge, none of the developers here use ppc64le.

hcho3 commented 5 years ago

@sh1ng Actually, it looks like NVIDIA provides PPC64LE Docker image: https://hub.docker.com/r/nvidia/cuda-ppc64le/. This is good news because I can now use QEMU to emulate ppc64le on my machine and build XGBoost.

(Instructions adopted from https://tthtlc.wordpress.com/2018/11/27/how-to-run-ppc-binaries-in-docker/)

cd ${HOME}
wget https://github.com/multiarch/qemu-user-static/releases/download/v2.7.0/qemu-ppc64le-static.tar.gz
tar xvf qemu-ppc64le-static.tar.gz
docker pull nvidia/cuda-ppc64le:8.0-devel-centos7
docker run --rm --privileged multiarch/qemu-user-static:register --reset
docker run --rm -v ${HOME}/qemu-ppc64le-static:/usr/bin/qemu-ppc64le-static \
                -it nvidia/cuda-ppc64le:8.0-devel-centos7 /bin/bash
# Get an interactive Bash session
# Now git clone XGBoost and build it

Compiler versions:

[root@a6c125a16afb build]# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@a6c125a16afb build]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:28:28_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Drat, it's GCC 4.8. I'll need to figure something out here.

sh1ng commented 5 years ago

There's no devtoolset package to ppcle64. CUDA-9 isn't compatible with gcc-6(there's no devtoolset-5 pkg). I prefer to support current major and the previous version.

I'd be awesome to compile xgboost statically with -static-libstdc++ and make it working

trivialfis commented 5 years ago

Unless I have been given priority and resource (power machine) for this I don't think I can be of help. And I can think of a few places where big endian can cause problems. ;-(