dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.34k stars 8.73k forks source link

Official support for onnx format. #5392

Open trivialfis opened 4 years ago

trivialfis commented 4 years ago

There's an exporter for XGBoost in https://github.com/onnx/onnxmltools . As far as I'm aware of there's no integration tests on XGBoost's side nor any formal document in XGBoost for this ONNX exporter. Last time I look into it was for fixing an issue in 1.0.0 release candidate, which broke the onnx exporter. But right now the converter should already be in good shape for most of the use cases.

My proposal for future work on XGBoost's side is we support the converter in onnxtools by:

booster.save_config()  # returns a JSON document showing only our internal configuration.

These use cases can not be replaced by protobuf.

@sandys @kylejn27 Let's gather the onnx related discussion here. ;-)

sandys commented 4 years ago

Thank you for this !

On Fri, 6 Mar, 2020, 10:56 Jiaming Yuan, notifications@github.com wrote:

There's an exporter for XGBoost in https://github.com/onnx/onnxmltools . As far as I'm aware of there's no integration tests on XGBoost's side nor any formal document in XGBoost for this ONNX exporter. Last time I look into it was for fixing an issue in 1.0.0 release candidate which had compatibility issue with onnx. But the converter should already be in good shape for most of the use cases.

My proposal for future work on XGBoost's side is we support the converter in onnxtools by:

  • change their parser implementation into JSON model format based. As the JSON will be our future representation for many stuffs including internal syncing/check pointing, also it's the base of our parameter validation and the serialization format of our configuration:

booster.save_config() # returns a JSON document showing only our internal configuration.

These use cases can not be replaced by protobuf.

-

Add tests. As always this is the best way to reduce chances of accidental breaking changes. There are some corner cases where we should look into during testings. For example, is the global bias correctly converted for logistic models and exponential models? Is there any floating point precision loss during the transformation?

Mention it in our documents.

(Optional), integrate it in our save_model function.

@sandys https://github.com/sandys @kylejn27 https://github.com/kylejn27 Let's gather the onnx related discussion here. ;-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/xgboost/issues/5392?email_source=notifications&email_token=AAASYU3FLEDA3GXXNKAD7WTRGCCPVA5CNFSM4LCY7AC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IS73HTQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASYUZX7EWYDFLLZ7CJBBLRGCCPVANCNFSM4LCY7ACQ .

sandys commented 4 years ago

My personal opinion is to not link this with onnxtools exporter. The amount of effort to do that (along with testcases,etc) is the same as creating a "save_onnx/read_onnx" hook that will internally create a json and then transform that to ONNX.

Also in reference to https://github.com/dmlc/xgboost/issues/5339 , the goal should be to have a Xgboost ONNX model deployable on onnx ecosystem (whether Azure or elsewhere)

On Fri, 6 Mar, 2020, 11:04 Sandeep, sandys@gmail.com wrote:

Thank you for this !

On Fri, 6 Mar, 2020, 10:56 Jiaming Yuan, notifications@github.com wrote:

There's an exporter for XGBoost in https://github.com/onnx/onnxmltools . As far as I'm aware of there's no integration tests on XGBoost's side nor any formal document in XGBoost for this ONNX exporter. Last time I look into it was for fixing an issue in 1.0.0 release candidate which had compatibility issue with onnx. But the converter should already be in good shape for most of the use cases.

My proposal for future work on XGBoost's side is we support the converter in onnxtools by:

  • change their parser implementation into JSON model format based. As the JSON will be our future representation for many stuffs including internal syncing/check pointing, also it's the base of our parameter validation and the serialization format of our configuration:

booster.save_config() # returns a JSON document showing only our internal configuration.

These use cases can not be replaced by protobuf.

-

Add tests. As always this is the best way to reduce chances of accidental breaking changes. There are some corner cases where we should look into during testings. For example, is the global bias correctly converted for logistic models and exponential models? Is there any floating point precision loss during the transformation?

Mention it in our documents.

(Optional), integrate it in our save_model function.

@sandys https://github.com/sandys @kylejn27 https://github.com/kylejn27 Let's gather the onnx related discussion here. ;-)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dmlc/xgboost/issues/5392?email_source=notifications&email_token=AAASYU3FLEDA3GXXNKAD7WTRGCCPVA5CNFSM4LCY7AC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IS73HTQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASYUZX7EWYDFLLZ7CJBBLRGCCPVANCNFSM4LCY7ACQ .

trivialfis commented 4 years ago

Yup. That's a good plan.