Closed vedranf closed 4 years ago
The single-instance prediction feature is experimental and so far I didn't figure out how to optimize the performance. If you have any suggestion, feel free to write here.
You might want to use the C function directly instead of going through the Python wrapper treelite_runtime
. Take a look at https://treelite.readthedocs.io/en/latest/tutorials/deploy.html#option-2-deploy-prediciton-code-only
Hello,
I did try C quickly using that doc and by manually populating the "inst" array with values (single row). Prediction takes ~35-40 microseconds which is very good. As for python slowness, I glanced at line_profiler output for predict_instance function, these snippets/loops take most of the time:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
335 243 293.0 1.2 9.7 for i in range(self.num_feature_):
336 242 364.0 1.5 12.1 entry[i].missing = -1
...
354 1 1.0 1.0 0.0 if missing is None or np.isnan(missing):
355 243 306.0 1.3 10.1 for i in range(inst.shape[0]):
356 242 1277.0 5.3 42.4 if not np.isnan(inst[i]):
357 242 572.0 2.4 19.0 entry[i].fvalue = inst[i]
The actual prediction takes <5% of the time. One suggestion would be to add an option to skip checking for missing or NaN values altogether in case when it is already known that input doesn't contain them (i.e. in my case preprocessing step ensures there are none), so no need to check again. Another suggestion is to avoid pure python loops when copying values from inst to entry[i].fvalue (ideally not copying at all)
Regards, Vedran
Closing this, since I decided to drop the single-instance prediction feature.
Hello,
I just discovered treelite project and wanted to do a quick prediction test against default xgboost. I followed quick start document and that worked flawlessly. Now on to prediction:
Default xgboost:
In [293]: xgb_clf10.predict_proba(d) Out[293]: array([[9.9999821e-01, 1.7704634e-06]], dtype=float32)
In [294]: %timeit xgb_clf10.predict_proba(d) 223 µs ± 503 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Now treelite:
In [295]: predictor.predict_instance(tdata[1]) Out[295]: array(1.7704634e-06, dtype=float32)
In [296]: %timeit predictor.predict_instance(tdata[1]) 1.01 ms ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I cannot use batch prediction, just one at the time. Both use single thread. Shared lib was generated using:
model.export_lib(toolchain='gcc', libpath='./xgb_clf10.so', verbose=True)
using gcc version 6.3.0 on Debian. Let me know if you need more info and whether the result above is expected.
Regards, Vedran