Open hcho3 opened 4 years ago
Looked into this a little bit. The implementation isn't difficult, but depends on https://github.com/dmlc/xgboost/pull/6605 due to the use of feature names/types. I will try to figure out a better way to store those information.
Currently, important functions such as feature importances and evaluation metrics rely on parsing of text strings, specifically the text output from the model dump function. For example:
https://github.com/dmlc/xgboost/blob/68c55a37d9bb680fe435f1d011e5fea62be97d22/python-package/xgboost/core.py#L1797-L1832
https://github.com/dmlc/xgboost/blob/68c55a37d9bb680fe435f1d011e5fea62be97d22/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/Booster.java#L509-L540
https://github.com/dmlc/xgboost/blob/68c55a37d9bb680fe435f1d011e5fea62be97d22/python-package/xgboost/training.py#L85-L91
https://github.com/dmlc/xgboost/blob/68c55a37d9bb680fe435f1d011e5fea62be97d22/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/Booster.java#L240-L255
Also see https://github.com/dmlc/xgboost/issues/4665#issuecomment-532932603 https://github.com/dmlc/xgboost/issues/4665#issuecomment-532945623
We should aim to eliminate all such uses of text parsing, since a slight change in the text dump will cause all these functions to break.
Proposed replacement:
Now that we have a functioning JSON library as well as numeric printing function (
charconv
) in XGBoost, it should be doable.