Open justurbo opened 1 year ago
That's surprising, gblinear should not be able to get there due to this condition https://github.com/dmlc/xgboost/blob/08ce495b5de973033160e7c7b650abf59346a984/python-package/xgboost/sklearn.py#L1097
That's surprising, gblinear should not be able to get there due to this condition
We’re having a really hard time trying to get GBLinear stable up and running in a highly concurrent production environment, server keeps crashing due to thread issues. Best bet to deploy this booster so far is to predict each dataset and store it.
Do you find it more useful than gbtree/dart?
Do you find it more useful than gbtree/dart?
Yes, we use it at this gaming website to predict game FPS. GBLinear has been a great success in terms of preserving feature scaling and getting accurate FPS predictions. No other model, tree or linear has had the same performance.
Interesting, thank you for sharing! Let me spend some time working on the gblinear booster later.
Interesting, thank you for sharing! Let me spend some time working on the gblinear booster later.
Would you like us to help you with some tasks? Our production is on fire right now. 🚒 🧑🚒
At this point, I think the task is to make the prediction thread safe for gblinear. I haven't looked into it before as there was very little use of the gblinear and no feedback whatsoever, we almost want to remove it.
However, if you have a simple reproducer of the issue, it would be really appreciated. I can make changes for the fix first instead of diving into code refactoring to catch up with gbtree.
At this point, I think the task is to make the prediction thread safe for gblinear. I haven't looked into it before as there was very little use of the gblinear and no feedback whatsoever, we almost want to remove it.
The crash happens at random while serving GBLinear via FastAPI, I cannot reproduce it on the spot, unfortunately.
GBLinear is incredible at providing accurate results while preserving the scaling of features (e.g. ordinal categorical features) which cannot be done on a noisy dataset using tree models. It would be a sad day if you guys drop it.
Preset Scaling:
game cpu gpu resolution preset upscaling min1Fps avgFps relative, % gain, % gain, FPS
0 Call of Duty: Warzone 2.0 Core i9-13900K GeForce RTX 4090 3840x2160 Minimum Native 125.594559 201.559128 100.000000 0.000000 0.000000
1 Call of Duty: Warzone 2.0 Core i9-13900K GeForce RTX 4090 3840x2160 Basic Native 119.461449 194.001831 96.250580 -3.749420 -7.557297
2 Call of Duty: Warzone 2.0 Core i9-13900K GeForce RTX 4090 3840x2160 Balanced Native 112.523071 184.714447 91.642807 -8.357193 -16.844681
3 Call of Duty: Warzone 2.0 Core i9-13900K GeForce RTX 4090 3840x2160 Ultra Native 104.779457 173.696991 86.176697 -13.823303 -27.862137
4 Call of Duty: Warzone 2.0 Core i9-13900K GeForce RTX 4090 3840x2160 Extreme Native 96.230560 160.949448 79.852226 -20.147774 -40.609680
I was able to solve this issue by implementing inference on my own:
Format of inference function inputs:
datasets: [ [x1, x2, x3], [x1, x2, x3], [x1, x2, x3], ... ]
coefficients: [ [c11, c12, c13], [c21, c22, c23] ] = np.array([reg.coef_.tolist()[i::len_y] for i in range(len(len_y))])
intercepts: [ a1, a2 ] = reg.intercept_
Inference function:
import numpy as np
def linear_regression(datasets: np.ndarray, coefficients: np.ndarray, intercepts: np.ndarray):
return (datasets[:, np.newaxis] * coefficients).sum(axis=2) + intercepts + 0.5
Server CPU usage decreased drastically.
GBLinear model is not thread-safe and cannot be easily deployed to production.
XGBRegressor(random_state=0, booster="gblinear")