dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.28k stars 8.73k forks source link

GBLinear: Inplace predict is not supported by current booster. #9110

Open justurbo opened 1 year ago

justurbo commented 1 year ago

GBLinear model is not thread-safe and cannot be easily deployed to production.

XGBRegressor(random_state=0, booster="gblinear")

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/mlserver/parallel/worker.py", line 136, in _process_request
    return_value = await method(
                   ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mlserver_xgboost/xgboost.py", line 88, in predict
    outputs = self._get_model_outputs(payload)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/mlserver_xgboost/xgboost.py", line 79, in _get_model_outputs
    y = predict_fn(decoded_request)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xgboost/sklearn.py", line 1114, in predict
    predts = self.get_booster().inplace_predict(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/xgboost/core.py", line 2292, in inplace_predict
    _check_call(
  File "/usr/local/lib/python3.11/site-packages/xgboost/core.py", line 279, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [22:57:35] /workspace/include/xgboost/gbm.h:120: Inplace predict is not supported by current booster.
Stack trace:
  [bt] (0) /usr/local/lib/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x1f69e0) [0xffff736569e0]
  [bt] (1) /usr/local/lib/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x2148a0) [0xffff736748a0]
  [bt] (2) /usr/local/lib/python3.11/site-packages/xgboost/lib/libxgboost.so(+0x65938) [0xffff734c5938]
  [bt] (3) /usr/local/lib/python3.11/site-packages/xgboost/lib/libxgboost.so(XGBoosterPredictFromDense+0xd4) [0xffff734c5db4]
  [bt] (4) /usr/lib/aarch64-linux-gnu/libffi.so.7(+0x6048) [0xffff84b1b048]
  [bt] (5) /usr/lib/aarch64-linux-gnu/libffi.so.7(+0x5770) [0xffff84b1a770]
  [bt] (6) /usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-aarch64-linux-gnu.so(+0x1235c) [0xffff84b4035c]
  [bt] (7) /usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-aarch64-linux-gnu.so(+0x12a90) [0xffff84b40a90]
  [bt] (8) /usr/local/bin/../lib/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x2c8) [0xffff8abf7518]
trivialfis commented 1 year ago

That's surprising, gblinear should not be able to get there due to this condition https://github.com/dmlc/xgboost/blob/08ce495b5de973033160e7c7b650abf59346a984/python-package/xgboost/sklearn.py#L1097

justurbo commented 1 year ago

That's surprising, gblinear should not be able to get there due to this condition

https://github.com/dmlc/xgboost/blob/08ce495b5de973033160e7c7b650abf59346a984/python-package/xgboost/sklearn.py#L1097

We’re having a really hard time trying to get GBLinear stable up and running in a highly concurrent production environment, server keeps crashing due to thread issues. Best bet to deploy this booster so far is to predict each dataset and store it.

trivialfis commented 1 year ago

Do you find it more useful than gbtree/dart?

justurbo commented 1 year ago

Do you find it more useful than gbtree/dart?

Yes, we use it at this gaming website to predict game FPS. GBLinear has been a great success in terms of preserving feature scaling and getting accurate FPS predictions. No other model, tree or linear has had the same performance.

trivialfis commented 1 year ago

Interesting, thank you for sharing! Let me spend some time working on the gblinear booster later.

justurbo commented 1 year ago

Interesting, thank you for sharing! Let me spend some time working on the gblinear booster later.

Would you like us to help you with some tasks? Our production is on fire right now. 🚒 🧑‍🚒

trivialfis commented 1 year ago

At this point, I think the task is to make the prediction thread safe for gblinear. I haven't looked into it before as there was very little use of the gblinear and no feedback whatsoever, we almost want to remove it.

trivialfis commented 1 year ago

However, if you have a simple reproducer of the issue, it would be really appreciated. I can make changes for the fix first instead of diving into code refactoring to catch up with gbtree.

justurbo commented 1 year ago

At this point, I think the task is to make the prediction thread safe for gblinear. I haven't looked into it before as there was very little use of the gblinear and no feedback whatsoever, we almost want to remove it.

The crash happens at random while serving GBLinear via FastAPI, I cannot reproduce it on the spot, unfortunately.

image

GBLinear is incredible at providing accurate results while preserving the scaling of features (e.g. ordinal categorical features) which cannot be done on a noisy dataset using tree models. It would be a sad day if you guys drop it.

Preset Scaling:
                        game             cpu               gpu resolution    preset upscaling     min1Fps      avgFps  relative, %    gain, %  gain, FPS
0  Call of Duty: Warzone 2.0  Core i9-13900K  GeForce RTX 4090  3840x2160   Minimum    Native  125.594559  201.559128   100.000000   0.000000   0.000000
1  Call of Duty: Warzone 2.0  Core i9-13900K  GeForce RTX 4090  3840x2160     Basic    Native  119.461449  194.001831    96.250580  -3.749420  -7.557297
2  Call of Duty: Warzone 2.0  Core i9-13900K  GeForce RTX 4090  3840x2160  Balanced    Native  112.523071  184.714447    91.642807  -8.357193 -16.844681
3  Call of Duty: Warzone 2.0  Core i9-13900K  GeForce RTX 4090  3840x2160     Ultra    Native  104.779457  173.696991    86.176697 -13.823303 -27.862137
4  Call of Duty: Warzone 2.0  Core i9-13900K  GeForce RTX 4090  3840x2160   Extreme    Native   96.230560  160.949448    79.852226 -20.147774 -40.609680
justurbo commented 1 year ago

I was able to solve this issue by implementing inference on my own:

Format of inference function inputs:

datasets: [ [x1, x2, x3], [x1, x2, x3], [x1, x2, x3], ... ]
coefficients: [ [c11, c12, c13], [c21, c22, c23] ] = np.array([reg.coef_.tolist()[i::len_y] for i in range(len(len_y))])
intercepts: [ a1, a2 ] = reg.intercept_

Inference function:

import numpy as np

def linear_regression(datasets: np.ndarray, coefficients: np.ndarray, intercepts: np.ndarray):
    return (datasets[:, np.newaxis] * coefficients).sum(axis=2) + intercepts + 0.5

Server CPU usage decreased drastically.